Notes & Decision Log
Format: YYYY-MM-DD — context — decision/finding.
A note on dates: the project page is dated 2025-09-15 to match the narrative arc of the public blog post that introduced the project. The deploy to the cron-host VM happened on 2026-05-18; the 2-day sprint is captured below under those dates.
Decisions
- Day 1 AM — Started with the obvious 4-class classifier (P0/P1/P2/Archive) over the most recent 568 messages. Result: 70% landed in P2 — same triage problem as before. Reframed to NOISE-as-default; classifier must justify any promotion. Same prompt structure, dramatically different surface.
- Day 1 AM — Picked Postgres 16 over SQLite. The split into 3 systemd timers + 1 API process = concurrent writers; pg handles that without LSAT-level WAL tuning.
- Day 1 PM — Added cross-channel verify (Jira REST + Sent-folder staleness). P0 candidate “Server is down, please reply ASAP” — Jira ticket already
Done, replied 14 min ago from phone. Real class: NOISE. Adding 2 extra signals cut false-positive P0 from 7–9/day → 2–3/day. - Day 1 PM — Decision: hard-rules before LLM, not as a post-filter. Share-noti regex skips the LLM entirely → 22% of daily volume is free.
- Day 2 AM — Thread-latest-only classification. Old replies bias the model toward stale states (“project on hold” 3 weeks ago when it’s now active). One row per thread, classifier sees only the most recent message. +12% accuracy on a 200-row eval.
- Day 2 AM — Action-verb row title. The row shows “Reply to vendor re: SLA breach”, not “Re: [URGENT][FW: FW:] SLA”. Constrained in the prompt as
verb-first, max 60 chars. - Day 2 AM — Killed the Snooze button. “Maybe Later” is anxiety as a feature — surfaces the same row tomorrow with no new context. Only Done and Archive remain.
- Day 2 midday — Native SwiftUI over Electron. This is a daily-driver tool; NSPanel + Liquid Glass aesthetic, instant launch, no Chromium overhead. No telemetry, no crash reporter, no analytics.
- Day 2 midday — 3 collapsible sections (P0 / P1 / P2). NOISE never surfaces. Keyboard shortcuts:
D(Done),A(Archive),J/K(next/prev). - Day 2 PM — Killed the planned Slack digest + iOS push. Multi-surface attention is the actual problem this product solves; adding more surfaces undoes the win.
- Day 2 PM — Action sync propagates to Gmail (archive INBOX + apply
Mail-Assistant/Donelabel) so the inbox stays clean if I ever check it elsewhere. - Day 2 PM — Deployed to the cron-host VM via systemd. Cloudflare named tunnel + bearer; optional CF Access if I want to lock it to a Google identity.
Gotchas
- Day 1 — IMAP thread resolution:
References:header chains break across providers (some strip them). Fell back to a normalized-subject + 24h-window heuristic for the gaps. ~3% of threads still mis-resolve; the impact is one extra row, not a missed P0. - Day 1 —
imaplibdefault timeout = none. A hung connection blocked the timer indefinitely. Fix: explicitsocket.setdefaulttimeout(30)+ UID-resume on next fire. - Day 1 — Postgres
UNIQUE (thread_id, classified_for_latest_uid)was the right idempotency key. Earlier draft keyed onthread_idalone — meant a new reply never re-classified. Caught by a test fixture where a new reply landed and the surface didn’t update. - Day 2 AM — Future-tense gate: “the meeting will happen Friday” was P0-classified because of “will”. Added a date-extraction rule capping at P2 when the target date is more than 7 days out.
- Day 2 AM — Local model bake-off (Llama 3.2 3B via Ollama) — false-positive on share-notis even after prompt tuning. Haiku 4.5 got share-noti + ambiguity right out of the box at $0.04/day. Skipped local.
- Day 2 midday — SwiftUI
NSPaneldoesn’t capture keyboard input by default. NeededbecomeKey = true+canBecomeKeyWindowoverride. - Day 2 midday — Liquid Glass aesthetic requires
NSVisualEffectViewbehind everyListrow, otherwise the blur shows the menu bar. Wrapped the whole surface in a singleBackgroundBlurView. - Day 2 PM — Gmail OAuth:
gmail.modifyscope is enough for label + archive;gmail.sendwas explicitly NOT requested. Avoids any “Mail-Assistant could send mail as you” prompt. - Day 2 PM — Cloudflare bearer + tunnel: the tunnel terminates TLS at the CF edge, but the bearer header is forwarded intact. Verified end-to-end with
curl -H "Authorization: Bearer ...".
The reclassify cost trap
The single most expensive operational lesson of the 2-day sprint:
Every prompt pivot tempted a full-corpus re-sweep on all 568 messages to validate. Two days of that burned ~$30–40 in tokens before the lesson stuck.
Fix: always test prompt changes on a 30-row stratified subset (proportional NOISE/P0/P1/P2 mix from prior labels) first. Only sweep the full corpus once the subset metrics are stable. This trims a typical experiment cycle from ~$2 to ~$0.10.
The deeper rule: classify cost in personal AI tooling is real and additive. A throwaway script run twice a day on 568 messages with a 1¢/run model is still $73/year. Subset-testing isn’t pedantic frugality, it’s the discipline that lets the project hit the $1.20/month target.
Reference links
- Blog post (public PM lens): Mail-Assistant: how I cut inbox triage from 90 min to 8 min
- Anthropic Python SDK: https://github.com/anthropics/anthropic-sdk-python
- IMAP RFC 3501: https://datatracker.ietf.org/doc/html/rfc3501
- RFC 5322 (Internet Message Format): https://datatracker.ietf.org/doc/html/rfc5322
- Gmail API users.messages.modify: https://developers.google.com/gmail/api/reference/rest/v1/users.messages/modify
- Jira REST v3 issue: https://developer.atlassian.com/cloud/jira/platform/rest/v3/api-group-issues/
- Cloudflare Tunnel docs: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
Working-session log
| Date | Hours | What | Outcome |
|---|---|---|---|
| Day 1 AM | ~3 h | IMAP poller, Postgres schema, naive 4-class classifier | First 568 messages classified, 70% landed P2 |
| Day 1 PM | ~3 h | NOISE-default reframe + cross-channel verify (Jira + Sent) | False-positive P0 9 → 3 |
| Day 2 AM | ~3 h | Thread-latest-only + share-noti hard-rule + action-verb summary | LLM volume drops 22%, accuracy +12% |
| Day 2 midday | ~3 h | SwiftUI app — 3 sections, Done/Archive, shortcuts | Surface complete |
| Day 2 PM | ~2 h | Gmail action sync + tunnel + bearer | End-to-end works |
| Day 2 eve | ~1 h | Deploy to cron-host VM, smoke on backlog | 7 P0/P1 surfaced from 568 messages |
| Total | ~15 hours | 2-day sprint | Shipped, hit all DoD metrics |