← Back to project
● Shipped P0 Size M Vertical app

Mail-Assistant — PRD

Product spec, JTBD, scope, milestones, and success metrics for Mail-Assistant.

Mail-Assistant — PRD

Size M · P0 · Vertical app Status: ✅ Shipped (2-day sprint) — see Implementation for build details Originally planned: 1 weekend / Actual: 2 days concentrated work

1. Problem

I get 60–120 emails a day across 3 accounts (work + 2 personal). Roughly 8% deserve a human reply within 24h; the remaining 92% is share-notifications, calendar invites, marketing, bots, and threads that already resolved themselves on other channels (Jira/Slack/phone replies).

The pain is not “too many emails.” The pain is decision fatigue per row: every single message asks me to classify, prioritize, draft a reply, file it, snooze it. An ADHD brain treats that as 60 simultaneous open loops, which costs ~90 minutes of high-energy morning attention before any real work happens.

Pain: ~90 minutes/day of unpaid attention labor, repeatable every morning, with the worst outcome (missing a real P0) still happening because the signal drowns in noise.

Why now: cross-channel context (Jira status, Sent-folder staleness, share-notification heuristics) is now cheap to assemble — and a sub-cent classifier (Haiku 4.5) can run on every thread without hitting cost concerns.

2. Goal & Success Metrics

Goal: Open laptop at 8am → see 3–7 rows that actually need a reply today, with the action verb already written into each row. Done in under 10 minutes.

Metrics — actual achieved (week 4):

MetricTargetAchievedNote
Daily triage time≤ 15 min~8 minMedian across 14 weekdays
False-positive P0/day≤ 32–3Down from 7–9 baseline
Threads surfaced/day≤ 107Out of 568 total messages
Classifier cost/month≤ $5~$1.20Haiku 4.5, ~2K threads/month
App surfaces11 (Mac only)Killed Slack digest, mobile push

3. JTBD

When I open my laptop at 8am and see 60+ unread threads, I want to know which 3–7 I actually need to reply to today, so that I don’t burn 90 minutes triaging before any real work happens.

4. User journey

  1. IMAP poller pulls new messages from 3 accounts every 5 minutes → upserts into threads.
  2. Classifier batch fires every 5 minutes → for each new thread:
    • Run share-notification hard-rule (NOISE if matched)
    • Look up linked Jira ticket → if Done/Closed, downgrade
    • Check Sent folder → if I replied in the last 30 minutes, downgrade
    • Call Haiku 4.5 with only the latest reply + cross-channel context
    • Write (class, action_summary, reason) into classifications
  3. I open Mail-Assistant.app → see 3 collapsible sections (P0 / P1 / P2). NOISE never appears.
  4. Press D (Done) or A (Archive) on a row → backend writes to actions → sync timer propagates to Gmail (archive INBOX + apply Mail-Assistant/Done label).
  5. NOISE auto-archives via the sync timer — never enters the surface.

5. Scope (MoSCoW) — final

Must — DONE:

  • ✅ IMAP pull for 3 accounts (work Gmail + 2 personal)
  • ✅ Per-thread classification into NOISE / P0 / P1 / P2 (NOISE = default)
  • ✅ Cross-channel verify: Jira ticket status + Sent-folder staleness
  • ✅ Native macOS UI with 3 collapsible sections
  • ✅ Done / Archive actions, with Gmail label sync

Should — DONE:

  • ✅ Share-notification hard-rule (Sheets/Notion/Dropbox/Figma/Calendly)
  • ✅ Thread-latest-only classification (drop history)
  • ✅ Action-verb row title ("Reply to <sender> re: <topic>", not raw subject)
  • ✅ Keyboard shortcuts (D / A / J / K)
  • ⏸️ Auto-draft reply — out of scope; surfacing is the actual product

Could — partial:

  • ⏸️ Mobile companion app — explicitly dropped (single surface principle)
  • ⏸️ Slack digest channel — dropped, would add noise back
  • ⏸️ Snooze button — dropped, “Maybe Later” is anxiety-as-a-feature
  • ⏸️ Smart unsubscribe — out of scope, share-noti rules cover 22% of volume

Won’t:

  • Multi-user / team triage (single-user system by design)
  • Outbound send-from-app (security boundary; Gmail keeps that role)
  • Calendar event creation (separate surface, not triage’s job)

6. Tech stack — final choices

LayerOriginal specImplementedReason for change
PullGmail API onlyIMAP (3 accounts)IMAP works uniformly across 3 different providers
ClassifierLocal Llama 3.2 3BClaude Haiku 4.5Local model false-positive on share-notis; Haiku gets the prompt right at $0.04/day
DBSQLitePostgres 16Concurrent writes from 3 timers + UI reads
BackendStandalone scriptFastAPI + systemd timersAPI for the Mac app + structured timer split
UIElectron + ReactSwiftUI nativeElectron felt heavy for a daily-driver tool; NSPanel + Liquid Glass matches the rest of the desktop
SurfaceMac app + Slack digest + iOS pushMac app onlySingle-surface decision = the actual product win
EndpointLocal-onlyCloudflare named tunnelLets me triage from a different laptop occasionally; bearer + optional CF Access
Action syncManualGmail API label + archiveDone/Archive propagates back to Gmail so the inbox stays clean if I check it elsewhere

7. Milestones — actual

DayWhat shipped
Day 1 AMIMAP poller + Postgres schema (threads, classifications, actions) + naive 4-class classifier
Day 1 PMNOISE-default reframe + cross-channel verify (Jira REST + Sent folder)
Day 2 AMThread-latest-only + share-notification hard-rule + action-verb summary
Day 2 middaySwiftUI app — 3 collapsible sections, Done/Archive shortcuts
Day 2 PMGmail action sync + Cloudflare named tunnel + bearer auth
Day 2 eveDeployment to cron-host VM, smoke test on 568-message backlog

DoD passed:

  • ✅ Triage time ≤ 15 min (achieved 8 min after 4 weeks of natural usage)
  • ✅ False-positive P0 ≤ 3/day (achieved 2–3)
  • ✅ Cost ≤ $5/month (achieved $1.20)

8. Cost & quota

ItemFree tier?Actual usage
Compute VM (1 vCPU / 1 GB)variesshared with other small services
Postgres 16 (same VM)n/a~80 MB after 3 months
Cloudflare Tunnel<1 MB/day
Claude Haiku 4.5pay-as-you-go~$1.20/month
Gmail APIwell under quota (label + archive only)
Jira RESTper-account quota fine

9. Risks & open questions — outcomes

Risks before build:

  • Classifier cost runaway → mitigated by NOISE-default + share-noti hard-rules (drops 22% of volume before any LLM call)
  • False-negative P0 (a real urgent thing classified NOISE) → mitigated by P1 = “uncertain but plausible”; manually reviewed for 2 weeks, no misses
  • Gmail API auth drift → mitigated by refresh-token rotation in the sync timer

Risk that bit us mid-build:

  • Reclassify cost trap — during pivots, sweeping all 568 messages through a new prompt to validate burned ~$30–40 in tokens over 2 days. Lesson: always test on a 30-row subset before full sweep. See Notes.

Open Qs at start:

  • Q1: Web UI instead of native? → ❌ rejected; Mac app is the daily driver, mobile triage is explicitly anti-goal
  • Q2: Should NOISE be visible at all? → ❌ no — surfacing NOISE re-creates the original problem
  • Q3: Auto-draft replies? → ⏸️ deferred; surfacing is the actual product, drafting is a separate JTBD

10. Definition of Done

Shipped: ✅ — backend on cron-host VM with 3 timers + Postgres; SwiftUI app on operator’s Mac; Cloudflare tunnel reachable; 4 weeks of natural usage at target metrics.

Future polish (deferred):

  • ⏳ Retro-style UI pass to match other side-project aesthetics
  • ⏳ Per-account triage budget (e.g. cap personal at 5 rows/day)
  • ⏳ Auto-draft for routine replies (newsletter cancellations, “thanks, received”)

See also

  • Architecture — component diagrams, data flow, security model
  • Implementation — schema, classifier prompt, perf numbers
  • Notes — chronological decision log + the reclassify cost trap
  • Blog post — PM lens on the 5 product decisions