Mail-Assistant — PRD

Size M · P0 · Vertical app Status: ✅ Shipped (2-day sprint) — see Implementation for build details Originally planned: 1 weekend / Actual: 2 days concentrated work

1. Problem

I get 60–120 emails a day across 3 accounts (work + 2 personal). Roughly 8% deserve a human reply within 24h; the remaining 92% is share-notifications, calendar invites, marketing, bots, and threads that already resolved themselves on other channels (Jira/Slack/phone replies).

The pain is not “too many emails.” The pain is decision fatigue per row: every single message asks me to classify, prioritize, draft a reply, file it, snooze it. An ADHD brain treats that as 60 simultaneous open loops, which costs ~90 minutes of high-energy morning attention before any real work happens.

Pain: ~90 minutes/day of unpaid attention labor, repeatable every morning, with the worst outcome (missing a real P0) still happening because the signal drowns in noise.

Why now: cross-channel context (Jira status, Sent-folder staleness, share-notification heuristics) is now cheap to assemble — and a sub-cent classifier (Haiku 4.5) can run on every thread without hitting cost concerns.

2. Goal & Success Metrics

Goal: Open laptop at 8am → see 3–7 rows that actually need a reply today, with the action verb already written into each row. Done in under 10 minutes.

Metrics — actual achieved (week 4):

Metric	Target	Achieved	Note
Daily triage time	≤ 15 min	~8 min	Median across 14 weekdays
False-positive P0/day	≤ 3	2–3	Down from 7–9 baseline
Threads surfaced/day	≤ 10	7	Out of 568 total messages
Classifier cost/month	≤ $5	~$1.20	Haiku 4.5, ~2K threads/month
App surfaces	1	1 (Mac only)	Killed Slack digest, mobile push

3. JTBD

When I open my laptop at 8am and see 60+ unread threads, I want to know which 3–7 I actually need to reply to today, so that I don’t burn 90 minutes triaging before any real work happens.

4. User journey

IMAP poller pulls new messages from 3 accounts every 5 minutes → upserts into threads.
Classifier batch fires every 5 minutes → for each new thread:
- Run share-notification hard-rule (NOISE if matched)
- Look up linked Jira ticket → if Done/Closed, downgrade
- Check Sent folder → if I replied in the last 30 minutes, downgrade
- Call Haiku 4.5 with only the latest reply + cross-channel context
- Write (class, action_summary, reason) into classifications
I open Mail-Assistant.app → see 3 collapsible sections (P0 / P1 / P2). NOISE never appears.
Press D (Done) or A (Archive) on a row → backend writes to actions → sync timer propagates to Gmail (archive INBOX + apply Mail-Assistant/Done label).
NOISE auto-archives via the sync timer — never enters the surface.

5. Scope (MoSCoW) — final

Must — DONE:

✅ IMAP pull for 3 accounts (work Gmail + 2 personal)
✅ Per-thread classification into NOISE / P0 / P1 / P2 (NOISE = default)
✅ Cross-channel verify: Jira ticket status + Sent-folder staleness
✅ Native macOS UI with 3 collapsible sections
✅ Done / Archive actions, with Gmail label sync

Should — DONE:

✅ Share-notification hard-rule (Sheets/Notion/Dropbox/Figma/Calendly)
✅ Thread-latest-only classification (drop history)
✅ Action-verb row title ("Reply to <sender> re: <topic>", not raw subject)
✅ Keyboard shortcuts (D / A / J / K)
⏸️ Auto-draft reply — out of scope; surfacing is the actual product

Could — partial:

⏸️ Mobile companion app — explicitly dropped (single surface principle)
⏸️ Slack digest channel — dropped, would add noise back
⏸️ Snooze button — dropped, “Maybe Later” is anxiety-as-a-feature
⏸️ Smart unsubscribe — out of scope, share-noti rules cover 22% of volume

Won’t:

Multi-user / team triage (single-user system by design)
Outbound send-from-app (security boundary; Gmail keeps that role)
Calendar event creation (separate surface, not triage’s job)

6. Tech stack — final choices

Layer	Original spec	Implemented	Reason for change
Pull	Gmail API only	IMAP (3 accounts)	IMAP works uniformly across 3 different providers
Classifier	Local Llama 3.2 3B	Claude Haiku 4.5	Local model false-positive on share-notis; Haiku gets the prompt right at $0.04/day
DB	SQLite	Postgres 16	Concurrent writes from 3 timers + UI reads
Backend	Standalone script	FastAPI + systemd timers	API for the Mac app + structured timer split
UI	Electron + React	SwiftUI native	Electron felt heavy for a daily-driver tool; NSPanel + Liquid Glass matches the rest of the desktop
Surface	Mac app + Slack digest + iOS push	Mac app only	Single-surface decision = the actual product win
Endpoint	Local-only	Cloudflare named tunnel	Lets me triage from a different laptop occasionally; bearer + optional CF Access
Action sync	Manual	Gmail API label + archive	Done/Archive propagates back to Gmail so the inbox stays clean if I check it elsewhere

7. Milestones — actual

Day	What shipped
Day 1 AM	IMAP poller + Postgres schema (`threads`, `classifications`, `actions`) + naive 4-class classifier
Day 1 PM	NOISE-default reframe + cross-channel verify (Jira REST + Sent folder)
Day 2 AM	Thread-latest-only + share-notification hard-rule + action-verb summary
Day 2 midday	SwiftUI app — 3 collapsible sections, Done/Archive shortcuts
Day 2 PM	Gmail action sync + Cloudflare named tunnel + bearer auth
Day 2 eve	Deployment to cron-host VM, smoke test on 568-message backlog

DoD passed:

✅ Triage time ≤ 15 min (achieved 8 min after 4 weeks of natural usage)
✅ False-positive P0 ≤ 3/day (achieved 2–3)
✅ Cost ≤ $5/month (achieved $1.20)

8. Cost & quota

Item	Free tier?	Actual usage
Compute VM (1 vCPU / 1 GB)	varies	shared with other small services
Postgres 16 (same VM)	n/a	~80 MB after 3 months
Cloudflare Tunnel	✅	<1 MB/day
Claude Haiku 4.5	pay-as-you-go	~$1.20/month
Gmail API	✅	well under quota (label + archive only)
Jira REST	✅	per-account quota fine

9. Risks & open questions — outcomes

Risks before build:

Classifier cost runaway → mitigated by NOISE-default + share-noti hard-rules (drops 22% of volume before any LLM call)
False-negative P0 (a real urgent thing classified NOISE) → mitigated by P1 = “uncertain but plausible”; manually reviewed for 2 weeks, no misses
Gmail API auth drift → mitigated by refresh-token rotation in the sync timer

Risk that bit us mid-build:

Reclassify cost trap — during pivots, sweeping all 568 messages through a new prompt to validate burned ~$30–40 in tokens over 2 days. Lesson: always test on a 30-row subset before full sweep. See Notes.

Open Qs at start:

Q1: Web UI instead of native? → ❌ rejected; Mac app is the daily driver, mobile triage is explicitly anti-goal
Q2: Should NOISE be visible at all? → ❌ no — surfacing NOISE re-creates the original problem
Q3: Auto-draft replies? → ⏸️ deferred; surfacing is the actual product, drafting is a separate JTBD

10. Definition of Done

Shipped: ✅ — backend on cron-host VM with 3 timers + Postgres; SwiftUI app on operator’s Mac; Cloudflare tunnel reachable; 4 weeks of natural usage at target metrics.

Future polish (deferred):

⏳ Retro-style UI pass to match other side-project aesthetics
⏳ Per-account triage budget (e.g. cap personal at 5 rows/day)
⏳ Auto-draft for routine replies (newsletter cancellations, “thanks, received”)

Mail-Assistant — PRD

Mail-Assistant — PRD

1. Problem

2. Goal & Success Metrics

3. JTBD

4. User journey

5. Scope (MoSCoW) — final

6. Tech stack — final choices

7. Milestones — actual

8. Cost & quota

9. Risks & open questions — outcomes

10. Definition of Done

See also