Implementation
Sister docs: PRD (intent), Architecture (system view), Notes (decision log).
TL;DR
A production-personal email triage agent built in 2 days:
- 3 IMAP accounts polled every 5 minutes → Postgres 16
- Haiku 4.5 classifier with NOISE-as-default + cross-channel verify (Jira + Sent staleness)
- Native SwiftUI macOS surface (no Electron, no mobile, no Slack digest)
- 568 daily messages → 7 P0/P1 rows surfaced
- ~$1.20/month classifier cost
- Done/Archive actions propagate back to Gmail via the sync timer
- Cloudflare named tunnel + bearer auth, optional CF Access
Stack
| Layer | Component | Version | Notes |
|---|---|---|---|
| Compute | cron-host VM | 1 vCPU / 1 GB | shared with other small services |
| OS | Debian | 12 minimal | |
| Runtime | Python | 3.11 + venv | |
| Web | FastAPI + uvicorn | latest | 127.0.0.1:8090 (tunnel-fronted) |
| DB | PostgreSQL | 16 | local socket; 80 MB after 3 months |
| Pull | imaplib + IDLE fallback | stdlib | 3 accounts, IMAPS only |
| Classifier | Anthropic Python SDK | latest | claude-haiku-4-5 |
| Action sync | google-api-python-client | latest | Gmail v1 (modify + labels) |
| Jira lookup | httpx + Jira REST v3 | latest | PAT auth |
| Tunnel | cloudflared | latest | named tunnel, systemd-managed |
| Scheduler | systemd timers | — | 3 timers, each OnCalendar=*:0/5 (or 0/2) |
| UI | Swift / SwiftUI | Xcode 15 / macOS 14+ | NSPanel + Liquid Glass aesthetic |
Directory layout
On VM (/opt/mail-assistant/)
src/
├── api.py # FastAPI app — /threads /actions /healthz (~180 lines)
├── timers/
│ ├── poll_imap.py # IMAP pull → upsert threads (~210 lines)
│ ├── classify_batch.py # share-noti + enrich + Haiku (~270 lines)
│ └── sync_actions.py # Gmail label + archive sync (~120 lines)
├── lib/
│ ├── db.py # psycopg connection pool
│ ├── jira.py # GET issue status with cache
│ ├── gmail.py # OAuth refresh + modify wrapper
│ ├── claude.py # Haiku client + JSON schema enforcement
│ ├── hardrules.py # share-noti regex bank, future-tense gate
│ └── threading.py # RFC 5322 references chain resolver
└── migrations/
├── 001_threads.sql
├── 002_classifications.sql
└── 003_actions.sql
systemd/
├── mail-assistant-api.service
├── mail-assistant-poll-imap.{service,timer}
├── mail-assistant-classify-batch.{service,timer}
└── mail-assistant-sync-actions.{service,timer}
/etc/mail-assistant/
├── secrets.env # IMAP creds, Jira PAT, Anthropic key (chmod 600)
├── gmail-token.json # OAuth refresh token (chmod 600)
└── bearer.token # Mac app bearer (chmod 600)
On operator Mac (~/Applications/Mail-Assistant.app)
MailAssistant/
├── MailAssistantApp.swift # @main, NSPanel setup
├── PanelController.swift # NSPanel + Liquid Glass styling
├── Views/
│ ├── TriageView.swift # 3 collapsible sections
│ ├── RowView.swift # action_summary + [D] [A] buttons
│ └── EmptyStateView.swift # "inbox zero" affordance
├── Models/
│ ├── Thread.swift # Codable mirror of API row
│ └── APIClient.swift # bearer auth, async/await
└── Resources/
└── Settings.swift # tunnel host + bearer (Keychain)
Schema
-- 001_threads.sql
CREATE TABLE threads (
thread_id TEXT PRIMARY KEY, -- RFC 5322-derived stable id
account TEXT NOT NULL, -- 'work' | 'personal-a' | 'personal-b'
latest_uid BIGINT NOT NULL, -- IMAP UID of latest message
latest_from TEXT NOT NULL,
latest_subject TEXT NOT NULL,
latest_snippet TEXT NOT NULL, -- first ~2000 chars of latest body
latest_date TIMESTAMPTZ NOT NULL,
jira_ticket_extracted TEXT, -- e.g. 'PROJ-1234' or NULL
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_threads_account_date ON threads(account, latest_date DESC);
CREATE INDEX idx_threads_updated ON threads(updated_at);
-- 002_classifications.sql
CREATE TABLE classifications (
id BIGSERIAL PRIMARY KEY,
thread_id TEXT NOT NULL REFERENCES threads(thread_id) ON DELETE CASCADE,
classified_for_latest_uid BIGINT NOT NULL, -- matches threads.latest_uid at classify time
class TEXT NOT NULL CHECK (class IN ('NOISE','P0','P1','P2')),
action_summary TEXT, -- "Reply to vendor re: SLA breach"
reason TEXT NOT NULL, -- "share-noti" | "jira-done" | "llm:..." | "sent-fresh"
tokens_used INT,
cost_usd NUMERIC(8,5),
classified_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (thread_id, classified_for_latest_uid) -- idempotency
);
CREATE INDEX idx_class_thread ON classifications(thread_id);
-- 003_actions.sql
CREATE TABLE actions (
id BIGSERIAL PRIMARY KEY,
thread_id TEXT NOT NULL REFERENCES threads(thread_id) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN ('done','archive')),
requested_at TIMESTAMPTZ NOT NULL DEFAULT now(),
synced_at TIMESTAMPTZ,
sync_error TEXT
);
CREATE INDEX idx_actions_unsynced ON actions(synced_at) WHERE synced_at IS NULL;
Why this shape:
threadsholds only the latest message per thread (thread-latest-only decision) — old messages are discarded after thread_id resolution(thread_id, classified_for_latest_uid) UNIQUE→ classification is idempotent and re-runs when a new reply arrives (different UID)actionsis append-only — Done/Archive becomes a row, never anUPDATE threads SET status = .... Easy to audit and replay.- Partial index on
actions(synced_at) WHERE synced_at IS NULL→ sync-actions timer query is constant-time regardless of historical action volume
FastAPI surface
| Endpoint | Auth | Purpose |
|---|---|---|
GET /healthz | none | Liveness probe (Cloudflare tunnel) |
GET /threads | bearer | Returns surfaced rows grouped by P0/P1/P2 |
POST /actions | bearer | {thread_id, kind} → insert into actions |
GET /stats | bearer | Daily counts (for the Mac app’s “30 days at a glance” view) |
Each handler ≈ 20–40 LOC. Bearer check is a single FastAPI dependency that compares against bearer.token with secrets.compare_digest.
Classifier prompt (Haiku 4.5)
SYSTEM_PROMPT = """\
You classify a single email thread into one of: NOISE, P0, P1, P2.
DEFAULT CLASS IS NOISE. You must justify any promotion above NOISE.
Definitions:
- P0 = Requires a human reply within 24h AND no other channel resolves it.
- P1 = Likely needs a reply this week; uncertain.
- P2 = FYI worth a glance, no reply required.
- NOISE = Bot, share-notification, marketing, calendar bot, already-resolved.
You will receive enriched context:
- sender, subject, latest snippet
- jira_status: 'Done' | 'Open' | 'unknown' | 'no-ticket'
- sent_staleness_minutes: int or null (how long since I last sent in this thread)
- future_tense_target: ISO date or null
Hard rules (apply before reasoning):
- If jira_status == 'Done' → downgrade by 1 (P0→P1, P1→P2, P2→NOISE).
- If sent_staleness_minutes != null and sent_staleness_minutes < 30 → NOISE.
- If future_tense_target > now()+7d → cap at P2.
Output JSON only:
{"class": "...", "action_summary": "<verb-first, max 60 chars>", "reason": "<one short clause>"}
"""
Notes on this prompt:
- The “DEFAULT IS NOISE” line is load-bearing. Removing it raises P0 false-positive rate by ~3×.
- Hard rules are also enforced in Python before the LLM call — the prompt restates them so the LLM doesn’t fight the post-processing. Belt and suspenders.
- Output is strict JSON; we use the SDK’s tool-call response format to force schema compliance.
action_summaryconstraint (“verb-first, max 60 chars”) makes row titles like “Reply to vendor re: SLA breach” instead of “Re: [URGENT][FW: FW:] SLA”.
Hard-rule bank
# lib/hardrules.py
SHARE_NOTI_SENDER_RE = re.compile(
r'\(via (Google Sheets|Google Docs|Notion|Dropbox|Figma|Calendly|Loom|Miro)\)',
re.IGNORECASE,
)
SHARE_NOTI_BODY_RE = re.compile(
r'(shared (a |an )?(document|file|sheet|page|board|design) with you'
r'|has shared .* with you'
r'|wants to share)',
re.IGNORECASE,
)
def is_share_notification(sender: str, snippet: str) -> bool:
return bool(SHARE_NOTI_SENDER_RE.search(sender) or SHARE_NOTI_BODY_RE.search(snippet))
FUTURE_TENSE_RE = re.compile(
r'\b(scheduled for|will happen on|reminder for|upcoming|on (Mon|Tue|Wed|Thu|Fri|Sat|Sun)\w*,?\s+\w+\s+\d{1,2})\b',
re.IGNORECASE,
)
JIRA_KEY_RE = re.compile(r'\b([A-Z]{2,8}-\d{1,6})\b')
22% of daily volume hits the share-noti hard-rule and never sees the LLM.
Cross-channel enrich
# Pseudo-code for the per-thread enrich step
def enrich(thread) -> dict:
ctx = {"jira_status": "no-ticket", "sent_staleness_minutes": None, "future_tense_target": None}
if thread.jira_ticket_extracted:
try:
status = jira.get_issue_status(thread.jira_ticket_extracted, cache_ttl=600)
ctx["jira_status"] = status # 'Done' | 'Open' | etc.
except httpx.HTTPError:
ctx["jira_status"] = "unknown" # degraded; log to reason
sent_dt = imap.last_sent_in_thread(thread.thread_id, account=thread.account)
if sent_dt:
ctx["sent_staleness_minutes"] = int((datetime.utcnow() - sent_dt).total_seconds() / 60)
m = FUTURE_TENSE_RE.search(thread.latest_subject + " " + thread.latest_snippet)
if m:
# naive date extraction; production uses dateparser
ctx["future_tense_target"] = try_parse_date(m.group(0))
return ctx
Performance numbers
Measured on the 1-vCPU cron-host VM:
| Operation | Cost | Notes |
|---|---|---|
| IMAP pull, 3 accounts, ~30 new msgs | ~4s | dominated by IMAP RTT, not parsing |
| Thread resolution + upsert (30 msgs) | ~120ms | psycopg executemany |
| Share-noti hard-rule sweep (50 threads) | <10ms | regex only |
| Jira enrich (cached 10 min TTL) | ~50ms warm / ~250ms cold | per thread with a ticket |
| Haiku classification | ~700ms p50 / ~1.4s p95 | ~250 input tokens + ~40 output tokens per call |
| Classifier batch (50 threads) | ~25s wall | sequential calls; well under 5-min cadence |
| Gmail sync (10 actions) | ~3s | batched messages.modify |
/threads API | ~80ms | single SQL + JSON serialize |
| Mail-Assistant.app cold launch | ~250ms | SwiftUI; first paint includes 1 API roundtrip |
Daily totals (week 4):
- ~568 messages pulled
- ~440 hit a hard-rule (share-noti / sender-blocklist) → 0 LLM cost
- ~128 reached the LLM → ~$0.04/day
- ~7 surfaced as P0/P1
Reclassify cost trap (operational metric)
During the 2-day sprint, every prompt pivot tempted a full-corpus re-sweep on all 568 messages to validate. Two days of that burned ~$30–40 in tokens before the lesson stuck: test prompt changes on a 30-row stratified subset first, only sweep the full corpus once the subset metrics are stable. See Notes for the running log.
Reliability features
| Feature | How |
|---|---|
| Idempotent classify | UNIQUE (thread_id, classified_for_latest_uid) |
| Idempotent IMAP pull | UID-based dedupe + threads_audit log |
| Bounded batch | classify-batch LIMIT 50 → cost guardrail |
| Optimistic UI | Row disappears on action press; sync timer is the truth |
| Action retry | sync_error column + cron retry of synced_at IS NULL |
| Auto-restart | All systemd units Restart=on-failure |
| Degraded Jira | Enrich tolerates 5xx, encodes unknown in reason |
| Token TTL | Gmail OAuth refresh-token rotation in sync timer |
| Backup | Nightly pg_dump to encrypted off-VM storage |
Security model
| Threat | Mitigation |
|---|---|
| Tunnel endpoint discovery | Private subdomain, no advertising; optional CF Access policy |
| Bearer theft | TLS only (CF-managed), 32-byte urlsafe token, stored in macOS Keychain on the client |
| IMAP credential leak | /etc/mail-assistant/secrets.env chmod 600, root-owned, not in repo |
| Anthropic API key leak | Same envelope as IMAP creds |
| Gmail OAuth scope creep | Token scoped to gmail.modify only — cannot send |
| Compromised VM exfiltrates email | Only latest_snippet (≤2000 chars) is stored; full bodies live in IMAP only and are fetched on-demand |
| Replay of action POSTs | actions is append-only; the same Done press twice is a harmless dupe row |
| Mac app local analytics | None — no telemetry, no crash reporter, no usage pings |
Reproducibility — quickstart for a forker
# 1. Provision a small VM (1 vCPU / 1 GB is enough)
# 2. Install Postgres 16 + Python 3.11
sudo apt install postgresql-16 python3.11 python3.11-venv
sudo -u postgres createuser mail
sudo -u postgres createdb mail_assistant -O mail
# 3. Clone + venv
python3.11 -m venv /opt/mail-assistant/venv
/opt/mail-assistant/venv/bin/pip install fastapi uvicorn psycopg[binary] \
anthropic google-api-python-client httpx imaplib2 dateparser
# 4. Drop secrets into /etc/mail-assistant/secrets.env (chmod 600)
# IMAP_WORK_USER, IMAP_WORK_PASS, ..., ANTHROPIC_API_KEY, JIRA_PAT
# 5. Apply migrations
psql -U mail -d mail_assistant -f src/migrations/001_threads.sql
psql -U mail -d mail_assistant -f src/migrations/002_classifications.sql
psql -U mail -d mail_assistant -f src/migrations/003_actions.sql
# 6. systemd
sudo cp systemd/*.service systemd/*.timer /etc/systemd/system/
sudo systemctl enable --now mail-assistant-api.service \
mail-assistant-poll-imap.timer \
mail-assistant-classify-batch.timer \
mail-assistant-sync-actions.timer
# 7. cloudflared named tunnel → https://<your-host>
# 8. Build the Mac app
xcodebuild -project MailAssistant.xcodeproj -scheme MailAssistant -configuration Release
# 9. Open Mail-Assistant.app → Settings → paste tunnel URL + bearer
Total: 1–2 hours if all credentials/tokens are ready.
Future work
- Auto-draft for routine replies (cancellation requests, “thanks, received”)
- Per-account budgets — cap personal-account rows at 5/day to protect work focus
- Subset-test harness baked in —
make eval N=30to run any prompt change against a stratified 30-row sample before full sweep (see Notes for the cost-trap lesson) - Retro-style UI pass to align with the rest of the desktop side-project family
- Public eval set — a synthetic 100-row email corpus published so others can benchmark their own triage classifiers
License & attribution
Personal project. Built on:
- Anthropic Python SDK (Claude Haiku 4.5)
- FastAPI
- PostgreSQL 16
- Cloudflare Tunnel free tier