Mail-Assistant — PRD
Size M · P0 · Vertical app Status: ✅ Shipped (2-day sprint) — see Implementation for build details Originally planned: 1 weekend / Actual: 2 days concentrated work
1. Problem
I get 60–120 emails a day across 3 accounts (work + 2 personal). Roughly 8% deserve a human reply within 24h; the remaining 92% is share-notifications, calendar invites, marketing, bots, and threads that already resolved themselves on other channels (Jira/Slack/phone replies).
The pain is not “too many emails.” The pain is decision fatigue per row: every single message asks me to classify, prioritize, draft a reply, file it, snooze it. An ADHD brain treats that as 60 simultaneous open loops, which costs ~90 minutes of high-energy morning attention before any real work happens.
Pain: ~90 minutes/day of unpaid attention labor, repeatable every morning, with the worst outcome (missing a real P0) still happening because the signal drowns in noise.
Why now: cross-channel context (Jira status, Sent-folder staleness, share-notification heuristics) is now cheap to assemble — and a sub-cent classifier (Haiku 4.5) can run on every thread without hitting cost concerns.
2. Goal & Success Metrics
Goal: Open laptop at 8am → see 3–7 rows that actually need a reply today, with the action verb already written into each row. Done in under 10 minutes.
Metrics — actual achieved (week 4):
| Metric | Target | Achieved | Note |
|---|---|---|---|
| Daily triage time | ≤ 15 min | ~8 min | Median across 14 weekdays |
| False-positive P0/day | ≤ 3 | 2–3 | Down from 7–9 baseline |
| Threads surfaced/day | ≤ 10 | 7 | Out of 568 total messages |
| Classifier cost/month | ≤ $5 | ~$1.20 | Haiku 4.5, ~2K threads/month |
| App surfaces | 1 | 1 (Mac only) | Killed Slack digest, mobile push |
3. JTBD
When I open my laptop at 8am and see 60+ unread threads, I want to know which 3–7 I actually need to reply to today, so that I don’t burn 90 minutes triaging before any real work happens.
4. User journey
- IMAP poller pulls new messages from 3 accounts every 5 minutes → upserts into
threads. - Classifier batch fires every 5 minutes → for each new thread:
- Run share-notification hard-rule (NOISE if matched)
- Look up linked Jira ticket → if
Done/Closed, downgrade - Check Sent folder → if I replied in the last 30 minutes, downgrade
- Call Haiku 4.5 with only the latest reply + cross-channel context
- Write
(class, action_summary, reason)intoclassifications
- I open Mail-Assistant.app → see 3 collapsible sections (P0 / P1 / P2). NOISE never appears.
- Press
D(Done) orA(Archive) on a row → backend writes toactions→ sync timer propagates to Gmail (archive INBOX + applyMail-Assistant/Donelabel). - NOISE auto-archives via the sync timer — never enters the surface.
5. Scope (MoSCoW) — final
Must — DONE:
- ✅ IMAP pull for 3 accounts (work Gmail + 2 personal)
- ✅ Per-thread classification into NOISE / P0 / P1 / P2 (NOISE = default)
- ✅ Cross-channel verify: Jira ticket status + Sent-folder staleness
- ✅ Native macOS UI with 3 collapsible sections
- ✅ Done / Archive actions, with Gmail label sync
Should — DONE:
- ✅ Share-notification hard-rule (Sheets/Notion/Dropbox/Figma/Calendly)
- ✅ Thread-latest-only classification (drop history)
- ✅ Action-verb row title (
"Reply to <sender> re: <topic>", not raw subject) - ✅ Keyboard shortcuts (
D/A/J/K) - ⏸️ Auto-draft reply — out of scope; surfacing is the actual product
Could — partial:
- ⏸️ Mobile companion app — explicitly dropped (single surface principle)
- ⏸️ Slack digest channel — dropped, would add noise back
- ⏸️ Snooze button — dropped, “Maybe Later” is anxiety-as-a-feature
- ⏸️ Smart unsubscribe — out of scope, share-noti rules cover 22% of volume
Won’t:
- Multi-user / team triage (single-user system by design)
- Outbound send-from-app (security boundary; Gmail keeps that role)
- Calendar event creation (separate surface, not triage’s job)
6. Tech stack — final choices
| Layer | Original spec | Implemented | Reason for change |
|---|---|---|---|
| Pull | Gmail API only | IMAP (3 accounts) | IMAP works uniformly across 3 different providers |
| Classifier | Local Llama 3.2 3B | Claude Haiku 4.5 | Local model false-positive on share-notis; Haiku gets the prompt right at $0.04/day |
| DB | SQLite | Postgres 16 | Concurrent writes from 3 timers + UI reads |
| Backend | Standalone script | FastAPI + systemd timers | API for the Mac app + structured timer split |
| UI | Electron + React | SwiftUI native | Electron felt heavy for a daily-driver tool; NSPanel + Liquid Glass matches the rest of the desktop |
| Surface | Mac app + Slack digest + iOS push | Mac app only | Single-surface decision = the actual product win |
| Endpoint | Local-only | Cloudflare named tunnel | Lets me triage from a different laptop occasionally; bearer + optional CF Access |
| Action sync | Manual | Gmail API label + archive | Done/Archive propagates back to Gmail so the inbox stays clean if I check it elsewhere |
7. Milestones — actual
| Day | What shipped |
|---|---|
| Day 1 AM | IMAP poller + Postgres schema (threads, classifications, actions) + naive 4-class classifier |
| Day 1 PM | NOISE-default reframe + cross-channel verify (Jira REST + Sent folder) |
| Day 2 AM | Thread-latest-only + share-notification hard-rule + action-verb summary |
| Day 2 midday | SwiftUI app — 3 collapsible sections, Done/Archive shortcuts |
| Day 2 PM | Gmail action sync + Cloudflare named tunnel + bearer auth |
| Day 2 eve | Deployment to cron-host VM, smoke test on 568-message backlog |
DoD passed:
- ✅ Triage time ≤ 15 min (achieved 8 min after 4 weeks of natural usage)
- ✅ False-positive P0 ≤ 3/day (achieved 2–3)
- ✅ Cost ≤ $5/month (achieved $1.20)
8. Cost & quota
| Item | Free tier? | Actual usage |
|---|---|---|
| Compute VM (1 vCPU / 1 GB) | varies | shared with other small services |
| Postgres 16 (same VM) | n/a | ~80 MB after 3 months |
| Cloudflare Tunnel | ✅ | <1 MB/day |
| Claude Haiku 4.5 | pay-as-you-go | ~$1.20/month |
| Gmail API | ✅ | well under quota (label + archive only) |
| Jira REST | ✅ | per-account quota fine |
9. Risks & open questions — outcomes
Risks before build:
- Classifier cost runaway → mitigated by NOISE-default + share-noti hard-rules (drops 22% of volume before any LLM call)
- False-negative P0 (a real urgent thing classified NOISE) → mitigated by P1 = “uncertain but plausible”; manually reviewed for 2 weeks, no misses
- Gmail API auth drift → mitigated by refresh-token rotation in the sync timer
Risk that bit us mid-build:
- Reclassify cost trap — during pivots, sweeping all 568 messages through a new prompt to validate burned ~$30–40 in tokens over 2 days. Lesson: always test on a 30-row subset before full sweep. See Notes.
Open Qs at start:
- Q1: Web UI instead of native? → ❌ rejected; Mac app is the daily driver, mobile triage is explicitly anti-goal
- Q2: Should NOISE be visible at all? → ❌ no — surfacing NOISE re-creates the original problem
- Q3: Auto-draft replies? → ⏸️ deferred; surfacing is the actual product, drafting is a separate JTBD
10. Definition of Done
Shipped: ✅ — backend on cron-host VM with 3 timers + Postgres; SwiftUI app on operator’s Mac; Cloudflare tunnel reachable; 4 weeks of natural usage at target metrics.
Future polish (deferred):
- ⏳ Retro-style UI pass to match other side-project aesthetics
- ⏳ Per-account triage budget (e.g. cap personal at 5 rows/day)
- ⏳ Auto-draft for routine replies (newsletter cancellations, “thanks, received”)
See also
- Architecture — component diagrams, data flow, security model
- Implementation — schema, classifier prompt, perf numbers
- Notes — chronological decision log + the reclassify cost trap
- Blog post — PM lens on the 5 product decisions