← Back to project
● M1 done P0 Size S Foundation

Personal-RAG — PRD

Product spec, scope, milestones, and success metrics for Personal-RAG.

Personal RAG Knowledge Base — PRD

Size S · P0 · Foundation Status: ✅ M1 done (2026-04-29), S3 migration done (2026-05-24) — see Implementation for build details Originally planned: 1 weekend / Actual: ~2 days concentrated work + ongoing iteration

1. Problem

Personal knowledge is fragmented across many sources: wiki/documentation pages, ticketing systems, chat threads, meeting transcripts, email, markdown notes, and AI conversation transcripts. When trying to recall “what did I read about X?”, finding the answer takes 15–30 minutes of manual hunting across separate tools. OS-level search (Spotlight) is keyword-only and doesn’t understand semantic intent. Per-source MCP search is slow and bloats the context window with raw chunks.

Pain: ~90% of consumed knowledge is not retrievable on demand.

Why now: this is a foundation pattern that 6+ downstream AI projects can reuse (recipe extractor, research agent, support bot, finance advisor, etc.). Build once, reuse many times. The corpus is also multi-workspace by nature — work knowledge, consulting knowledge, personal notes, and external vendor reference docs each have different trust tiers and need scoped access.

2. Goal & Success Metrics

Goal: Ask a Claude client “any thoughts on vector DB X for ~100K vectors?” → get top-5 relevant chunks + source links in under 3 seconds, scoped to the right workspace automatically.

Metrics — actual achieved:

MetricTarget M1AchievedNote
Hit@3 on held-out eval≥80%97.8%93-query personal-workspace eval (2026-05-21)
MRR on held-out eval≥0.850.948Same eval set
Latency p95 warm<3s840 msEmbed query + ANN search + rerank + tunnel
Latency p95 cold<5s2.3 sIncludes model load
Sources ingested100 docs42,000+ sources / 182,000+ chunksAcross 6 workspaces
Workspaces16ll / mindx / _personal / _shared / _canon / _secrets
TouchpointTelegramMCP: Claude Desktop + Claude.ai web + iOS appPivoted Day 1, validated Day 8
Resource footprintn/a388 MB idle / 501 MB activeLocal MBP M2 Max

3. User journey (revised)

Pivot from original: Dropped a custom Telegram bot in favor of MCP — Claude clients can call workspace-scoped tools (kb_search_ll, kb_search_personal, etc.) natively.

  1. A sync agent (chat exporter, meeting transcriber, AI session hook, ticket crawler, vendor-doc crawler) writes a .md file into the local KB mount path.
  2. A filesystem watcher (tako-mount-watcher, fswatch-driven) detects the write and calls kb-ingest-file.py → chunks + embeds + stores.
  3. User asks Claude: “Anything new on topic X this week?”
  4. Claude routes to the right scoped tool (e.g. kb_search_ll with client_filter='PCF' when query mentions PCF) → returns top-K chunks with source URIs.
  5. Claude synthesizes an answer with citations.

4. Scope (MoSCoW) — final

Must — DONE:

  • ✅ Ingest markdown files (URL/PDF supported via downstream sync agents)
  • ✅ Chunk + embed + store in Postgres 16 + pgvector with HNSW
  • ✅ MCP tools per workspace: kb_health, kb_ingest, kb_search_*, kb_stats
  • ✅ Citations: search results include source_uri and chunk_idx
  • ✅ Multi-workspace isolation with scoped tools

Should — DONE:

  • ✅ Idempotent ingest via server-side SHA-256 hash check
  • ✅ Re-ingest replaces chunks if content hash changed
  • ✅ Auto-tag from folder hierarchy (path-based classifier)
  • ✅ Reranker stage (bge-reranker-v2-m3) on top-N — marginal +3.2pp Hit@1, shipped as default
  • ⏸️ BM25 + vector hybrid — semantic + reranker proved sufficient on the eval set

Could — partial:

  • ⏸️ Apple Notes import — out of scope, low ROI for actual usage
  • ⏸️ Kindle highlights import — same reason
  • ✅ Wiki/docs crawler integration — handled by downstream scripts
  • ✅ Vendor-doc crawler (Mode C of the audit/canon pipeline) — populates _canon workspace
  • ❌ Custom web UI — replaced by Claude clients themselves

Won’t (M1–M3) — kept:

  • Multi-user support (single-user system by design)
  • Real-time sync — fswatch + hourly mirror is sufficient
  • Native mobile app — Claude iOS app inherits via OAuth

5. Architecture (final)

Pivoted from “RAG + Telegram Bot” → MCP Server + Multi-workspace scoped retrieval + Filesystem-first ingest. See Architecture for diagrams.

6. Tech Stack — final choices

LayerOriginal specImplementedReason for change
LLM servingLocal Llama 3.2 3BExternal (caller’s Claude client)MCP delegates LLM to caller; server only does retrieval
EmbedderBGE-small-en-v1.5bge-m3 (multilingual)English-only embedder underperformed on mixed VN/EN; bge-m3 handles VN+EN+code-mixed
Reranker(deferred)bge-reranker-v2-m3 (cross-encoder)+3.2pp Hit@1 on eval set, ~negligible latency at top-20
Vector DBOracle ADB 23aiPostgres 16 + pgvector (HNSW)Local deploy, no cloud dependency for the hot path, fits 3.2 GB comfortably
Object storage(none)MinIO S3 primary + filesystem mirror fallbackBlobStore abstraction; dual-scheme URIs (s3:// + file://)
Bot frameworkpython-telegram-botMCP Streamable HTTPNative Claude integration, multi-client for free
HTTP frameworkFastMCP + Starlette + uvicornMCP SDK provides this out-of-box
TunnelPublic IP + nginxCloudflare named tunnelPersistent URL, no inbound port open; optional, used only for remote-device access
Auth”Telegram only”OAuth 2.0 (PKCE+DCR) + legacy bearerSupports mobile/web/desktop clients
Runtime hostCloud VMLocal MacBook Pro M2 MaxLow latency, no cloud bill, hardware already paid for; daemon runs under launchd
Auto-ingestsystemd cronlaunchd tako-mount-watcher (fswatch)Real-time, no polling

Cost posture: $0/month for compute (own hardware) + $0 for Cloudflare tunnel. No managed services in the hot path. The architecture intentionally avoids vendor lock-in — every component is OSS-replaceable.

7. Milestones — actual

DayWhat shipped
Day 1MCP server scaffold (FastMCP), tunnel, bearer auth
Day 2kb_ingest / kb_search / kb_stats tools, embed bench, schema applied
Day 3Bulk migrate first 5,000+ sources (~95 min wall)
Day 3bEmbed model swap to multilingual model (VN/EN parity)
Day 4-5Refactor sync sources to call MCP kb_ingest (filesystem-first rule)
Day 6Persistent named tunnel via custom domain
Day 7Weekly backup + restore script
Day 8OAuth 2.0 + DCR + PKCE → Claude.ai web + iOS app access
2026-05-xMulti-workspace refactor: 6 workspaces + scoped MCP tools
2026-05-21Reranker shipped (bge-reranker-v2-m3); Hit@3 = 97.8% on held-out eval
2026-05-24S3 migration: BlobStore abstraction (MinIO primary + FS mirror); v0.6.0-s3
2026-05-25_canon workspace + kb_search_canon tool for vendor authoritative docs

M1 DoD passed:

  • ✅ Ingest 5,000+ sources (target 10) — now 42K+
  • ✅ Real queries answered correctly — Hit@3 = 97.8% on held-out eval
  • ✅ Latency 840 ms p95 warm (target <5s)

8. Cost & Quota

ItemFree?Actual usage
Postgres 16 + pgvector (local)3.2 GB on disk
MinIO S3 (local)~50 GB blob mirror
Cloudflare named tunnel<1 MB/day data
Compute (MBP M2 Max)✅ (owned hw)388 MB idle / 501 MB active
Daily Postgres backup (tako-pg-backup)local
Hourly FS mirror (tako-fs-backup)local

No cloud serving cost. The optional Cloudflare tunnel is only used when accessing from a non-host device (phone, other laptop).

9. Risks & open questions — outcomes

Original risks:

  • Cloud DB free-tier eviction → eliminated by going local Postgres
  • Embedding quality on Vietnamese text → resolved with bge-m3 (multilingual SOTA at 568M params)
  • Local LLM crash recovery → N/A (LLM not used in final architecture)

Current risks:

  • Single-host SPOF — mitigated by hourly FS mirror + daily Postgres dump; restore tested
  • Cloudflare bot management blocking default UAs → fixed with custom UA header
  • Token rotation reliance on operator memory — push to password manager

Original open Qs:

  • Q1: Web UI from M2? → ❌ dropped, Claude clients are sufficient
  • Q2: Privacy of sensitive content? → ✅ accepted, single-tenant local deployment; _secrets workspace stays out of the MCP surface
  • Q3: Reranker latency on CPU? → resolved, shipped 2026-05-21

10. Definition of Done

M1 Done: ✅ 2026-04-29 — initial corpus ingested, OAuth flow live, multilingual search working, DR backup in place.

S3 milestone done: ✅ 2026-05-24 — BlobStore abstraction shipped, 1000-row regression at 100% parity, hourly + daily backups via launchd.

M3 Done (production-ready):

  • ⏳ TOTP 2FA or SSO wrap on /login
  • ✅ Reranker eval on top-N candidates → measured +3.2pp Hit@1
  • ✅ Daily-driver criterion: in continuous personal use across 4+ weeks without architecture pivot

See also

  • Implementation — technical deep-dive (deploy, code structure, perf numbers)
  • Architecture — component diagrams, data flow, security model
  • Notes — chronological decision log