← Back to project
● Shipped P0 Size M Vertical app

Mail-Assistant — Implementation

Tech stack deep-dive: schema, classifier prompt, performance numbers, security model, reproducibility.

Implementation

Sister docs: PRD (intent), Architecture (system view), Notes (decision log).

TL;DR

A production-personal email triage agent built in 2 days:

  • 3 IMAP accounts polled every 5 minutes → Postgres 16
  • Haiku 4.5 classifier with NOISE-as-default + cross-channel verify (Jira + Sent staleness)
  • Native SwiftUI macOS surface (no Electron, no mobile, no Slack digest)
  • 568 daily messages → 7 P0/P1 rows surfaced
  • ~$1.20/month classifier cost
  • Done/Archive actions propagate back to Gmail via the sync timer
  • Cloudflare named tunnel + bearer auth, optional CF Access

Stack

LayerComponentVersionNotes
Computecron-host VM1 vCPU / 1 GBshared with other small services
OSDebian12 minimal
RuntimePython3.11 + venv
WebFastAPI + uvicornlatest127.0.0.1:8090 (tunnel-fronted)
DBPostgreSQL16local socket; 80 MB after 3 months
Pullimaplib + IDLE fallbackstdlib3 accounts, IMAPS only
ClassifierAnthropic Python SDKlatestclaude-haiku-4-5
Action syncgoogle-api-python-clientlatestGmail v1 (modify + labels)
Jira lookuphttpx + Jira REST v3latestPAT auth
Tunnelcloudflaredlatestnamed tunnel, systemd-managed
Schedulersystemd timers3 timers, each OnCalendar=*:0/5 (or 0/2)
UISwift / SwiftUIXcode 15 / macOS 14+NSPanel + Liquid Glass aesthetic

Directory layout

On VM (/opt/mail-assistant/)

src/
├── api.py                        # FastAPI app — /threads /actions /healthz (~180 lines)
├── timers/
│   ├── poll_imap.py              # IMAP pull → upsert threads (~210 lines)
│   ├── classify_batch.py         # share-noti + enrich + Haiku (~270 lines)
│   └── sync_actions.py           # Gmail label + archive sync (~120 lines)
├── lib/
│   ├── db.py                     # psycopg connection pool
│   ├── jira.py                   # GET issue status with cache
│   ├── gmail.py                  # OAuth refresh + modify wrapper
│   ├── claude.py                 # Haiku client + JSON schema enforcement
│   ├── hardrules.py              # share-noti regex bank, future-tense gate
│   └── threading.py              # RFC 5322 references chain resolver
└── migrations/
    ├── 001_threads.sql
    ├── 002_classifications.sql
    └── 003_actions.sql

systemd/
├── mail-assistant-api.service
├── mail-assistant-poll-imap.{service,timer}
├── mail-assistant-classify-batch.{service,timer}
└── mail-assistant-sync-actions.{service,timer}

/etc/mail-assistant/
├── secrets.env                   # IMAP creds, Jira PAT, Anthropic key (chmod 600)
├── gmail-token.json              # OAuth refresh token (chmod 600)
└── bearer.token                  # Mac app bearer (chmod 600)

On operator Mac (~/Applications/Mail-Assistant.app)

MailAssistant/
├── MailAssistantApp.swift        # @main, NSPanel setup
├── PanelController.swift         # NSPanel + Liquid Glass styling
├── Views/
│   ├── TriageView.swift          # 3 collapsible sections
│   ├── RowView.swift             # action_summary + [D] [A] buttons
│   └── EmptyStateView.swift      # "inbox zero" affordance
├── Models/
│   ├── Thread.swift              # Codable mirror of API row
│   └── APIClient.swift           # bearer auth, async/await
└── Resources/
    └── Settings.swift            # tunnel host + bearer (Keychain)

Schema

-- 001_threads.sql
CREATE TABLE threads (
    thread_id              TEXT PRIMARY KEY,        -- RFC 5322-derived stable id
    account                TEXT NOT NULL,           -- 'work' | 'personal-a' | 'personal-b'
    latest_uid             BIGINT NOT NULL,         -- IMAP UID of latest message
    latest_from            TEXT NOT NULL,
    latest_subject         TEXT NOT NULL,
    latest_snippet         TEXT NOT NULL,           -- first ~2000 chars of latest body
    latest_date            TIMESTAMPTZ NOT NULL,
    jira_ticket_extracted  TEXT,                    -- e.g. 'PROJ-1234' or NULL
    updated_at             TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_threads_account_date ON threads(account, latest_date DESC);
CREATE INDEX idx_threads_updated      ON threads(updated_at);

-- 002_classifications.sql
CREATE TABLE classifications (
    id                            BIGSERIAL PRIMARY KEY,
    thread_id                     TEXT NOT NULL REFERENCES threads(thread_id) ON DELETE CASCADE,
    classified_for_latest_uid     BIGINT NOT NULL,   -- matches threads.latest_uid at classify time
    class                         TEXT NOT NULL CHECK (class IN ('NOISE','P0','P1','P2')),
    action_summary                TEXT,              -- "Reply to vendor re: SLA breach"
    reason                        TEXT NOT NULL,     -- "share-noti" | "jira-done" | "llm:..." | "sent-fresh"
    tokens_used                   INT,
    cost_usd                      NUMERIC(8,5),
    classified_at                 TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (thread_id, classified_for_latest_uid)   -- idempotency
);
CREATE INDEX idx_class_thread ON classifications(thread_id);

-- 003_actions.sql
CREATE TABLE actions (
    id            BIGSERIAL PRIMARY KEY,
    thread_id     TEXT NOT NULL REFERENCES threads(thread_id) ON DELETE CASCADE,
    kind          TEXT NOT NULL CHECK (kind IN ('done','archive')),
    requested_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    synced_at     TIMESTAMPTZ,
    sync_error    TEXT
);
CREATE INDEX idx_actions_unsynced ON actions(synced_at) WHERE synced_at IS NULL;

Why this shape:

  • threads holds only the latest message per thread (thread-latest-only decision) — old messages are discarded after thread_id resolution
  • (thread_id, classified_for_latest_uid) UNIQUE → classification is idempotent and re-runs when a new reply arrives (different UID)
  • actions is append-only — Done/Archive becomes a row, never an UPDATE threads SET status = .... Easy to audit and replay.
  • Partial index on actions(synced_at) WHERE synced_at IS NULL → sync-actions timer query is constant-time regardless of historical action volume

FastAPI surface

EndpointAuthPurpose
GET /healthznoneLiveness probe (Cloudflare tunnel)
GET /threadsbearerReturns surfaced rows grouped by P0/P1/P2
POST /actionsbearer{thread_id, kind} → insert into actions
GET /statsbearerDaily counts (for the Mac app’s “30 days at a glance” view)

Each handler ≈ 20–40 LOC. Bearer check is a single FastAPI dependency that compares against bearer.token with secrets.compare_digest.

Classifier prompt (Haiku 4.5)

SYSTEM_PROMPT = """\
You classify a single email thread into one of: NOISE, P0, P1, P2.

DEFAULT CLASS IS NOISE. You must justify any promotion above NOISE.

Definitions:
- P0 = Requires a human reply within 24h AND no other channel resolves it.
- P1 = Likely needs a reply this week; uncertain.
- P2 = FYI worth a glance, no reply required.
- NOISE = Bot, share-notification, marketing, calendar bot, already-resolved.

You will receive enriched context:
- sender, subject, latest snippet
- jira_status: 'Done' | 'Open' | 'unknown' | 'no-ticket'
- sent_staleness_minutes: int or null (how long since I last sent in this thread)
- future_tense_target: ISO date or null

Hard rules (apply before reasoning):
- If jira_status == 'Done' → downgrade by 1 (P0→P1, P1→P2, P2→NOISE).
- If sent_staleness_minutes != null and sent_staleness_minutes < 30 → NOISE.
- If future_tense_target > now()+7d → cap at P2.

Output JSON only:
{"class": "...", "action_summary": "<verb-first, max 60 chars>", "reason": "<one short clause>"}
"""

Notes on this prompt:

  • The “DEFAULT IS NOISE” line is load-bearing. Removing it raises P0 false-positive rate by ~3×.
  • Hard rules are also enforced in Python before the LLM call — the prompt restates them so the LLM doesn’t fight the post-processing. Belt and suspenders.
  • Output is strict JSON; we use the SDK’s tool-call response format to force schema compliance.
  • action_summary constraint (“verb-first, max 60 chars”) makes row titles like “Reply to vendor re: SLA breach” instead of “Re: [URGENT][FW: FW:] SLA”.

Hard-rule bank

# lib/hardrules.py

SHARE_NOTI_SENDER_RE = re.compile(
    r'\(via (Google Sheets|Google Docs|Notion|Dropbox|Figma|Calendly|Loom|Miro)\)',
    re.IGNORECASE,
)
SHARE_NOTI_BODY_RE = re.compile(
    r'(shared (a |an )?(document|file|sheet|page|board|design) with you'
    r'|has shared .* with you'
    r'|wants to share)',
    re.IGNORECASE,
)

def is_share_notification(sender: str, snippet: str) -> bool:
    return bool(SHARE_NOTI_SENDER_RE.search(sender) or SHARE_NOTI_BODY_RE.search(snippet))


FUTURE_TENSE_RE = re.compile(
    r'\b(scheduled for|will happen on|reminder for|upcoming|on (Mon|Tue|Wed|Thu|Fri|Sat|Sun)\w*,?\s+\w+\s+\d{1,2})\b',
    re.IGNORECASE,
)

JIRA_KEY_RE = re.compile(r'\b([A-Z]{2,8}-\d{1,6})\b')

22% of daily volume hits the share-noti hard-rule and never sees the LLM.

Cross-channel enrich

# Pseudo-code for the per-thread enrich step
def enrich(thread) -> dict:
    ctx = {"jira_status": "no-ticket", "sent_staleness_minutes": None, "future_tense_target": None}

    if thread.jira_ticket_extracted:
        try:
            status = jira.get_issue_status(thread.jira_ticket_extracted, cache_ttl=600)
            ctx["jira_status"] = status        # 'Done' | 'Open' | etc.
        except httpx.HTTPError:
            ctx["jira_status"] = "unknown"     # degraded; log to reason

    sent_dt = imap.last_sent_in_thread(thread.thread_id, account=thread.account)
    if sent_dt:
        ctx["sent_staleness_minutes"] = int((datetime.utcnow() - sent_dt).total_seconds() / 60)

    m = FUTURE_TENSE_RE.search(thread.latest_subject + " " + thread.latest_snippet)
    if m:
        # naive date extraction; production uses dateparser
        ctx["future_tense_target"] = try_parse_date(m.group(0))

    return ctx

Performance numbers

Measured on the 1-vCPU cron-host VM:

OperationCostNotes
IMAP pull, 3 accounts, ~30 new msgs~4sdominated by IMAP RTT, not parsing
Thread resolution + upsert (30 msgs)~120mspsycopg executemany
Share-noti hard-rule sweep (50 threads)<10msregex only
Jira enrich (cached 10 min TTL)~50ms warm / ~250ms coldper thread with a ticket
Haiku classification~700ms p50 / ~1.4s p95~250 input tokens + ~40 output tokens per call
Classifier batch (50 threads)~25s wallsequential calls; well under 5-min cadence
Gmail sync (10 actions)~3sbatched messages.modify
/threads API~80mssingle SQL + JSON serialize
Mail-Assistant.app cold launch~250msSwiftUI; first paint includes 1 API roundtrip

Daily totals (week 4):

  • ~568 messages pulled
  • ~440 hit a hard-rule (share-noti / sender-blocklist) → 0 LLM cost
  • ~128 reached the LLM → ~$0.04/day
  • ~7 surfaced as P0/P1

Reclassify cost trap (operational metric)

During the 2-day sprint, every prompt pivot tempted a full-corpus re-sweep on all 568 messages to validate. Two days of that burned ~$30–40 in tokens before the lesson stuck: test prompt changes on a 30-row stratified subset first, only sweep the full corpus once the subset metrics are stable. See Notes for the running log.

Reliability features

FeatureHow
Idempotent classifyUNIQUE (thread_id, classified_for_latest_uid)
Idempotent IMAP pullUID-based dedupe + threads_audit log
Bounded batchclassify-batch LIMIT 50 → cost guardrail
Optimistic UIRow disappears on action press; sync timer is the truth
Action retrysync_error column + cron retry of synced_at IS NULL
Auto-restartAll systemd units Restart=on-failure
Degraded JiraEnrich tolerates 5xx, encodes unknown in reason
Token TTLGmail OAuth refresh-token rotation in sync timer
BackupNightly pg_dump to encrypted off-VM storage

Security model

ThreatMitigation
Tunnel endpoint discoveryPrivate subdomain, no advertising; optional CF Access policy
Bearer theftTLS only (CF-managed), 32-byte urlsafe token, stored in macOS Keychain on the client
IMAP credential leak/etc/mail-assistant/secrets.env chmod 600, root-owned, not in repo
Anthropic API key leakSame envelope as IMAP creds
Gmail OAuth scope creepToken scoped to gmail.modify only — cannot send
Compromised VM exfiltrates emailOnly latest_snippet (≤2000 chars) is stored; full bodies live in IMAP only and are fetched on-demand
Replay of action POSTsactions is append-only; the same Done press twice is a harmless dupe row
Mac app local analyticsNone — no telemetry, no crash reporter, no usage pings

Reproducibility — quickstart for a forker

# 1. Provision a small VM (1 vCPU / 1 GB is enough)
# 2. Install Postgres 16 + Python 3.11
sudo apt install postgresql-16 python3.11 python3.11-venv
sudo -u postgres createuser mail
sudo -u postgres createdb mail_assistant -O mail

# 3. Clone + venv
python3.11 -m venv /opt/mail-assistant/venv
/opt/mail-assistant/venv/bin/pip install fastapi uvicorn psycopg[binary] \
    anthropic google-api-python-client httpx imaplib2 dateparser

# 4. Drop secrets into /etc/mail-assistant/secrets.env (chmod 600)
#    IMAP_WORK_USER, IMAP_WORK_PASS, ..., ANTHROPIC_API_KEY, JIRA_PAT

# 5. Apply migrations
psql -U mail -d mail_assistant -f src/migrations/001_threads.sql
psql -U mail -d mail_assistant -f src/migrations/002_classifications.sql
psql -U mail -d mail_assistant -f src/migrations/003_actions.sql

# 6. systemd
sudo cp systemd/*.service systemd/*.timer /etc/systemd/system/
sudo systemctl enable --now mail-assistant-api.service \
    mail-assistant-poll-imap.timer \
    mail-assistant-classify-batch.timer \
    mail-assistant-sync-actions.timer

# 7. cloudflared named tunnel → https://<your-host>

# 8. Build the Mac app
xcodebuild -project MailAssistant.xcodeproj -scheme MailAssistant -configuration Release
# 9. Open Mail-Assistant.app → Settings → paste tunnel URL + bearer

Total: 1–2 hours if all credentials/tokens are ready.

Future work

  • Auto-draft for routine replies (cancellation requests, “thanks, received”)
  • Per-account budgets — cap personal-account rows at 5/day to protect work focus
  • Subset-test harness baked inmake eval N=30 to run any prompt change against a stratified 30-row sample before full sweep (see Notes for the cost-trap lesson)
  • Retro-style UI pass to align with the rest of the desktop side-project family
  • Public eval set — a synthetic 100-row email corpus published so others can benchmark their own triage classifiers

License & attribution

Personal project. Built on: