Personal RAG Knowledge Base — PRD
Size S · P0 · Foundation Status: ✅ M1 done (2026-04-29), S3 migration done (2026-05-24) — see Implementation for build details Originally planned: 1 weekend / Actual: ~2 days concentrated work + ongoing iteration
1. Problem
Personal knowledge is fragmented across many sources: wiki/documentation pages, ticketing systems, chat threads, meeting transcripts, email, markdown notes, and AI conversation transcripts. When trying to recall “what did I read about X?”, finding the answer takes 15–30 minutes of manual hunting across separate tools. OS-level search (Spotlight) is keyword-only and doesn’t understand semantic intent. Per-source MCP search is slow and bloats the context window with raw chunks.
Pain: ~90% of consumed knowledge is not retrievable on demand.
Why now: this is a foundation pattern that 6+ downstream AI projects can reuse (recipe extractor, research agent, support bot, finance advisor, etc.). Build once, reuse many times. The corpus is also multi-workspace by nature — work knowledge, consulting knowledge, personal notes, and external vendor reference docs each have different trust tiers and need scoped access.
2. Goal & Success Metrics
Goal: Ask a Claude client “any thoughts on vector DB X for ~100K vectors?” → get top-5 relevant chunks + source links in under 3 seconds, scoped to the right workspace automatically.
Metrics — actual achieved:
| Metric | Target M1 | Achieved | Note |
|---|---|---|---|
| Hit@3 on held-out eval | ≥80% | 97.8% | 93-query personal-workspace eval (2026-05-21) |
| MRR on held-out eval | ≥0.85 | 0.948 | Same eval set |
| Latency p95 warm | <3s | 840 ms | Embed query + ANN search + rerank + tunnel |
| Latency p95 cold | <5s | 2.3 s | Includes model load |
| Sources ingested | 100 docs | 42,000+ sources / 182,000+ chunks | Across 6 workspaces |
| Workspaces | 1 | 6 | ll / mindx / _personal / _shared / _canon / _secrets |
| Touchpoint | Telegram | MCP: Claude Desktop + Claude.ai web + iOS app | Pivoted Day 1, validated Day 8 |
| Resource footprint | n/a | 388 MB idle / 501 MB active | Local MBP M2 Max |
3. User journey (revised)
Pivot from original: Dropped a custom Telegram bot in favor of MCP — Claude clients can call workspace-scoped tools (kb_search_ll, kb_search_personal, etc.) natively.
- A sync agent (chat exporter, meeting transcriber, AI session hook, ticket crawler, vendor-doc crawler) writes a
.mdfile into the local KB mount path. - A filesystem watcher (
tako-mount-watcher, fswatch-driven) detects the write and callskb-ingest-file.py→ chunks + embeds + stores. - User asks Claude: “Anything new on topic X this week?”
- Claude routes to the right scoped tool (e.g.
kb_search_llwithclient_filter='PCF'when query mentions PCF) → returns top-K chunks with source URIs. - Claude synthesizes an answer with citations.
4. Scope (MoSCoW) — final
Must — DONE:
- ✅ Ingest markdown files (URL/PDF supported via downstream sync agents)
- ✅ Chunk + embed + store in Postgres 16 + pgvector with HNSW
- ✅ MCP tools per workspace:
kb_health,kb_ingest,kb_search_*,kb_stats - ✅ Citations: search results include
source_uriandchunk_idx - ✅ Multi-workspace isolation with scoped tools
Should — DONE:
- ✅ Idempotent ingest via server-side SHA-256 hash check
- ✅ Re-ingest replaces chunks if content hash changed
- ✅ Auto-tag from folder hierarchy (path-based classifier)
- ✅ Reranker stage (
bge-reranker-v2-m3) on top-N — marginal +3.2pp Hit@1, shipped as default - ⏸️ BM25 + vector hybrid — semantic + reranker proved sufficient on the eval set
Could — partial:
- ⏸️ Apple Notes import — out of scope, low ROI for actual usage
- ⏸️ Kindle highlights import — same reason
- ✅ Wiki/docs crawler integration — handled by downstream scripts
- ✅ Vendor-doc crawler (Mode C of the audit/canon pipeline) — populates
_canonworkspace - ❌ Custom web UI — replaced by Claude clients themselves
Won’t (M1–M3) — kept:
- Multi-user support (single-user system by design)
- Real-time sync — fswatch + hourly mirror is sufficient
- Native mobile app — Claude iOS app inherits via OAuth
5. Architecture (final)
Pivoted from “RAG + Telegram Bot” → MCP Server + Multi-workspace scoped retrieval + Filesystem-first ingest. See Architecture for diagrams.
6. Tech Stack — final choices
| Layer | Original spec | Implemented | Reason for change |
|---|---|---|---|
| LLM serving | Local Llama 3.2 3B | External (caller’s Claude client) | MCP delegates LLM to caller; server only does retrieval |
| Embedder | BGE-small-en-v1.5 | bge-m3 (multilingual) | English-only embedder underperformed on mixed VN/EN; bge-m3 handles VN+EN+code-mixed |
| Reranker | (deferred) | bge-reranker-v2-m3 (cross-encoder) | +3.2pp Hit@1 on eval set, ~negligible latency at top-20 |
| Vector DB | Oracle ADB 23ai | Postgres 16 + pgvector (HNSW) | Local deploy, no cloud dependency for the hot path, fits 3.2 GB comfortably |
| Object storage | (none) | MinIO S3 primary + filesystem mirror fallback | BlobStore abstraction; dual-scheme URIs (s3:// + file://) |
| Bot framework | python-telegram-bot | MCP Streamable HTTP | Native Claude integration, multi-client for free |
| HTTP framework | — | FastMCP + Starlette + uvicorn | MCP SDK provides this out-of-box |
| Tunnel | Public IP + nginx | Cloudflare named tunnel | Persistent URL, no inbound port open; optional, used only for remote-device access |
| Auth | ”Telegram only” | OAuth 2.0 (PKCE+DCR) + legacy bearer | Supports mobile/web/desktop clients |
| Runtime host | Cloud VM | Local MacBook Pro M2 Max | Low latency, no cloud bill, hardware already paid for; daemon runs under launchd |
| Auto-ingest | systemd cron | launchd tako-mount-watcher (fswatch) | Real-time, no polling |
Cost posture: $0/month for compute (own hardware) + $0 for Cloudflare tunnel. No managed services in the hot path. The architecture intentionally avoids vendor lock-in — every component is OSS-replaceable.
7. Milestones — actual
| Day | What shipped |
|---|---|
| Day 1 | MCP server scaffold (FastMCP), tunnel, bearer auth |
| Day 2 | kb_ingest / kb_search / kb_stats tools, embed bench, schema applied |
| Day 3 | Bulk migrate first 5,000+ sources (~95 min wall) |
| Day 3b | Embed model swap to multilingual model (VN/EN parity) |
| Day 4-5 | Refactor sync sources to call MCP kb_ingest (filesystem-first rule) |
| Day 6 | Persistent named tunnel via custom domain |
| Day 7 | Weekly backup + restore script |
| Day 8 | OAuth 2.0 + DCR + PKCE → Claude.ai web + iOS app access |
| 2026-05-x | Multi-workspace refactor: 6 workspaces + scoped MCP tools |
| 2026-05-21 | Reranker shipped (bge-reranker-v2-m3); Hit@3 = 97.8% on held-out eval |
| 2026-05-24 | S3 migration: BlobStore abstraction (MinIO primary + FS mirror); v0.6.0-s3 |
| 2026-05-25 | _canon workspace + kb_search_canon tool for vendor authoritative docs |
M1 DoD passed:
- ✅ Ingest 5,000+ sources (target 10) — now 42K+
- ✅ Real queries answered correctly — Hit@3 = 97.8% on held-out eval
- ✅ Latency 840 ms p95 warm (target <5s)
8. Cost & Quota
| Item | Free? | Actual usage |
|---|---|---|
| Postgres 16 + pgvector (local) | ✅ | 3.2 GB on disk |
| MinIO S3 (local) | ✅ | ~50 GB blob mirror |
| Cloudflare named tunnel | ✅ | <1 MB/day data |
| Compute (MBP M2 Max) | ✅ (owned hw) | 388 MB idle / 501 MB active |
Daily Postgres backup (tako-pg-backup) | ✅ | local |
Hourly FS mirror (tako-fs-backup) | ✅ | local |
No cloud serving cost. The optional Cloudflare tunnel is only used when accessing from a non-host device (phone, other laptop).
9. Risks & open questions — outcomes
Original risks:
- Cloud DB free-tier eviction → eliminated by going local Postgres
- Embedding quality on Vietnamese text → resolved with bge-m3 (multilingual SOTA at 568M params)
- Local LLM crash recovery → N/A (LLM not used in final architecture)
Current risks:
- Single-host SPOF — mitigated by hourly FS mirror + daily Postgres dump; restore tested
- Cloudflare bot management blocking default UAs → fixed with custom UA header
- Token rotation reliance on operator memory — push to password manager
Original open Qs:
- Q1: Web UI from M2? → ❌ dropped, Claude clients are sufficient
- Q2: Privacy of sensitive content? → ✅ accepted, single-tenant local deployment;
_secretsworkspace stays out of the MCP surface - Q3: Reranker latency on CPU? → resolved, shipped 2026-05-21
10. Definition of Done
M1 Done: ✅ 2026-04-29 — initial corpus ingested, OAuth flow live, multilingual search working, DR backup in place.
S3 milestone done: ✅ 2026-05-24 — BlobStore abstraction shipped, 1000-row regression at 100% parity, hourly + daily backups via launchd.
M3 Done (production-ready):
- ⏳ TOTP 2FA or SSO wrap on
/login - ✅ Reranker eval on top-N candidates → measured +3.2pp Hit@1
- ✅ Daily-driver criterion: in continuous personal use across 4+ weeks without architecture pivot
See also
- Implementation — technical deep-dive (deploy, code structure, perf numbers)
- Architecture — component diagrams, data flow, security model
- Notes — chronological decision log