Implementation
Sister docs: PRD (intent), Architecture (system view), Notes (decision log).
TL;DR
A production-ready personal RAG system in continuous personal use:
- 42,000+ sources / 182,000+ chunks across 6 workspaces (mixed Vietnamese + English text)
- Postgres 16 + pgvector with HNSW; 3.2 GB on disk
- bge-m3 multilingual embedder (1024-dim) + bge-reranker-v2-m3 cross-encoder rerank
- MCP Streamable HTTP server (FastMCP) with OAuth 2.0 (PKCE + DCR) — accessible from Claude Desktop, Claude.ai web, Claude iOS app
- Workspace-scoped tools:
kb_search_ll,kb_search_mindx,kb_search_personal,kb_search_shared,kb_search_canon - Latency: p95 840 ms warm / 2.3 s cold end-to-end
- Resource: 388 MB idle / 501 MB active on MacBook Pro M2 Max
- S3-native storage (since 2026-05-24): MinIO BlobStore primary + filesystem mirror fallback; dual-scheme URIs (
s3://+file://) - Auto-ingest: launchd
tako-mount-watcherwatches the KB mount path via fswatch - Backups: launchd
tako-pg-backup(daily) +tako-fs-backup(hourly)
Stack
| Layer | Component | Version / Notes |
|---|---|---|
| Compute | MacBook Pro M2 Max | local; daemon footprint 388 MB idle / 501 MB active |
| OS | macOS | launchd-managed jobs |
| Runtime | Python | 3.11 + venv |
| MCP SDK | mcp | Streamable HTTP transport, built-in OAuth scaffolding |
| HTTP server | uvicorn + Starlette | 127.0.0.1:8080 (tunnel-fronted when remote access needed) |
| Embedding | bge-m3 via sentence-transformers | 1024-dim, multilingual (incl. Vietnamese), MPS-accelerated |
| Reranker | bge-reranker-v2-m3 | cross-encoder, top-N rerank stage |
| Vector DB | Postgres 16 + pgvector | HNSW index, cosine distance |
| Driver | asyncpg | local socket / TCP |
| Blob store | MinIO (S3-compatible) | primary; FS mirror fallback |
| Mirror tool | rclone | hourly S3 → FS sync |
| Tunnel | cloudflared | named tunnel; optional, only for remote-device access |
| Auth | OAuth 2.0 + PBKDF2 password | sha256 200K iters; DCR-enabled, refresh rotation |
| Process manager | launchd | ai.tako.mcp, ai.tako.mount-watcher, ai.tako.pg-backup, ai.tako.fs-backup, ai.tako.cloudflared |
The internal package name in code remains tako; this doc uses the product name Personal-RAG for clarity.
Directory layout
Server (~/Documents/Side.Projects/tako/server/)
src/
├── server_local.py # FastMCP wrapper, Starlette routes
├── oauth_provider.py # OAuthAuthorizationServerProvider impl
├── oauth_login.py # /login GET + POST routes
├── blobstore.py # BlobStore abstraction (MinIO + FS fallback)
├── workspaces.py # WORKSPACE_MAP + scoped tool factory
├── bulk_migrate.py # bulk re-ingest helper
├── re_embed.py # model swap retool
├── backup.py # pg_dump driver
└── restore.py # disaster recovery driver
lib/
├── instructions.py # MCP server orchestration playbook (sent in serverInfo)
└── classifiers.py # path → workspace + source_type rules
Operator host (~/.claude/)
hooks/
├── save-convo.py # Stop hook → archive AI session + auto-ingest
├── kb-ingest-file.py # Generic file → kb_ingest helper (workspace-aware)
└── kb-ingest-file.log
launchd/
├── ai.tako.mcp.plist
├── ai.tako.mount-watcher.plist
├── ai.tako.pg-backup.plist
├── ai.tako.fs-backup.plist
└── ai.tako.cloudflared.plist
KB mount (~/Documents/KB-s3/)
KB-s3/
├── ll/ # work — partitioned by client (pcf/, newlife/, ilham/, bbl/, _generic/, _multi/)
├── mindx/ # consulting
├── _personal/ # side projects, memory, finance, health
├── _shared/ # cross-cutting research
├── _canon/ # vendor authoritative docs (Anthropic/xAI/HF/arxiv)
└── _secrets/ # encrypted vault — NOT exposed via MCP
The mount path = canonical write target post-S3 migration. Writes under the mount get URI s3://tako-kb/<key>; legacy paths outside the mount keep file://<abs> URIs (still indexed for backward compat).
Schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE kb_sources (
id BIGSERIAL PRIMARY KEY,
workspace TEXT NOT NULL, -- 'll' | 'mindx' | '_personal' | '_shared' | '_canon' | '_secrets'
client TEXT, -- 'PCF' | 'NewLife' | 'Ilham' | 'BBL' | '_generic' | '_multi' | NULL
source_uri TEXT NOT NULL, -- s3://tako-kb/... or file://...
source_type TEXT,
title TEXT,
tags TEXT, -- comma-sep
metadata JSONB,
content_hash TEXT, -- sha256, idempotency key
supersedes BIGINT REFERENCES kb_sources(id),
archived_at TIMESTAMPTZ,
ingested_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
CONSTRAINT uq_kb_sources_uri UNIQUE (source_uri)
);
CREATE INDEX idx_kb_sources_ws_client ON kb_sources(workspace, client);
CREATE INDEX idx_kb_sources_type ON kb_sources(source_type);
CREATE INDEX idx_kb_sources_ingested ON kb_sources(ingested_at);
CREATE TABLE kb_chunks (
id BIGSERIAL PRIMARY KEY,
source_id BIGINT NOT NULL REFERENCES kb_sources(id) ON DELETE CASCADE,
chunk_idx INT NOT NULL,
text TEXT NOT NULL,
embedding vector(1024), -- bge-m3
CONSTRAINT uq_kb_chunks_src_idx UNIQUE (source_id, chunk_idx)
);
CREATE INDEX idx_kb_chunks_source ON kb_chunks(source_id);
CREATE INDEX kb_emb_hnsw ON kb_chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Why this shape:
(workspace, client)index → scoped tools filter cheaplysource_uri UNIQUE+content_hash→ ingest is idempotentsupersedes+archived_at→ audit trail for evolving docs (e.g. PRD v1 → v2)- HNSW with
vector_cosine_ops→ fast ANN; rebuild after bulk loads ON DELETE CASCADE→ drop source → chunks auto-removed
MCP tools surface
| Tool | Args | Purpose |
|---|---|---|
kb_health | — | DB ping + sources/chunks count |
kb_ingest | text, source_uri, source_type, title, tags[], metadata{}, workspace | Idempotent insert/update |
kb_search_ll | query, client_filter, top_k=5, tag_filter, source_type | Work-workspace search with multi-tenant filter |
kb_search_mindx | query, top_k=5, … | MindX consulting workspace |
kb_search_personal | query, top_k=5, … | Personal workspace |
kb_search_shared | query, top_k=5, … | Shared research notes |
kb_search_canon | query, top_k=5, … | External vendor authoritative docs |
kb_search_all | query, top_k=5, … | Cross-workspace (use sparingly) |
kb_stats | — | Total + breakdown by workspace / source_type |
Each tool ≈ 30–60 LOC. FastMCP @mcp.tool() decorator handles JSON Schema generation, parsing, response packaging.
Chunking strategy
Token-window chunking with overlap, sized to fit comfortably under bge-m3’s 8192-token max sequence while keeping per-chunk semantic density high. A safety cap on chunks-per-file prevents extremely long transcripts (100K+ tokens) from blowing up storage; head-only embedding covers the first ~50 windows which captures intent and key decisions.
Embedding pipeline (bge-m3)
bge-m3 doesn’t require explicit task prefixes (unlike e5), but the embedder is invoked with batched encoding for throughput. Cross-encoder rerank runs on the top-N (default 20) candidates from the ANN stage and reorders by bge-reranker-v2-m3 relevance score before returning top-K.
Measured retrieval quality (93-query held-out personal eval, 2026-05-21):
| Stage | Hit@1 | Hit@3 | MRR |
|---|---|---|---|
| bge-m3 only | 86.0% | 97.8% | 0.918 |
| bge-m3 + bge-reranker-v2-m3 | 89.2% | 97.8% | 0.948 |
The reranker swap was shipped as default for the marginal Hit@1 lift.
Auth dual-mode
# oauth_provider.py: load_access_token() falls back to legacy bearer
async def load_access_token(self, token: str) -> AccessToken | None:
if secrets.compare_digest(token, LEGACY_BEARER):
return AccessToken(token=token, client_id="_legacy_bearer", scopes=["mcp"])
state = _load_state()
return AccessToken.model_validate(state["access_tokens"][token]) if token in state["access_tokens"] else None
This allows:
- Claude Desktop (mcp-remote bridge →
localhost:8080direct, bearer in header) — primary - Claude.ai web (full OAuth flow with DCR + PKCE, via Cloudflare tunnel)
- Claude iOS app — inherits from claude.ai web account
- curl scripts on host — bearer for ad-hoc admin
OAuth flow (sequence)
sequenceDiagram
autonumber
participant C as Claude.ai
participant B as Browser
participant S as MCP Server
participant DB as State Store
C->>S: GET /.well-known/oauth-authorization-server
S-->>C: AS metadata (RFC 8414)
C->>S: POST /register (DCR)
S->>DB: Save client
S-->>C: 201 {client_id}
C->>B: Redirect to /authorize?
client_id, redirect_uri, code_challenge
B->>S: GET /authorize
S->>DB: Create session
S-->>B: 302 → /login?session=sid
B->>S: GET /login (password form)
S-->>B: HTML form
B->>S: POST /login (password)
S->>S: PBKDF2 verify (200K iters)
S->>DB: Generate auth code
S-->>B: 302 → redirect_uri?code=...&state=...
B->>C: Forwards code
C->>S: POST /token (code + verifier)
S->>S: Verify PKCE challenge
S->>DB: Issue access_token + refresh_token
S-->>C: 200 {access_token, refresh_token, expires_in: 3600}
Note over C,S: Subsequent MCP calls
C->>S: POST /mcp + Authorization: Bearer
S->>DB: Validate token
S-->>C: MCP response
Note over C,S: After 1 hour
C->>S: POST /token (grant_type=refresh_token)
S->>DB: Rotate refresh + new access
S-->>C: 200 {new tokens}
Ingest flow (filesystem-first)
[Skill / Hook / Crawler]
│ writes
▼
<kb-mount>/<workspace>/<folder>/<slug>.md ← canonical, source-of-truth
│
│ fswatch event → tako-mount-watcher launchd job
│ exec python3 <hooks>/kb-ingest-file.py <path>
▼
[kb-ingest-file.py]
│ reads file, classify workspace + source_type from path, auto-tag
│ HTTPS POST localhost:8080/mcp tools/call kb_ingest
▼
[Server: kb_ingest tool]
│ SHA-256(content), check kb_sources.content_hash
│ if match → return {skipped: true}
│ if differ → DELETE old chunks, recompute
│ chunk(text), bge-m3 encode (1024-dim)
│ INSERT kb_sources (or UPDATE), INSERT kb_chunks bulk
│ persist blob via BlobStore (s3:// primary, file:// fallback)
▼
Postgres + MinIO
Why filesystem-first: the local filesystem (with Time Machine + hourly MinIO mirror) is durable. Postgres + MinIO are treated as derived indexes — both re-buildable from filesystem. Worst case = re-embed cost (~hours), no data loss.
Performance numbers
Measured on MacBook Pro M2 Max (MPS-accelerated embedder):
| Operation | Number | Notes |
|---|---|---|
| Daemon memory — idle | 388 MB | model loaded, no inflight |
| Daemon memory — active | 501 MB | during search |
| Postgres DB size | 3.2 GB | sources + chunks + indexes |
| Sources indexed | 42,000+ | across 6 workspaces |
| Chunks indexed | 182,000+ | bge-m3 1024-dim |
kb_search_* p95 (warm) | 840 ms | embed + ANN + rerank + tunnel-or-local |
kb_search_* p95 (cold) | 2.3 s | includes model load |
kb_health | <80 ms | DB ping |
| Hit@3 on held-out eval | 97.8% | 93 personal-workspace queries |
| MRR on held-out eval | 0.948 | with reranker |
| Post-S3 regression test | 100% parity | 1000-row sample, 2026-05-24 |
Reliability features
| Feature | How |
|---|---|
| Idempotent ingest | content_hash compare → skip duplicate uploads |
| Filesystem source-of-truth | re-buildable from ~/Documents/KB-s3/ |
| Real-time auto-ingest | tako-mount-watcher launchd + fswatch |
| Hourly blob mirror | tako-fs-backup rclone MinIO → FS |
| Daily Postgres dump | tako-pg-backup |
| Auto-restart MCP server | launchd KeepAlive |
| Auto-restart tunnel | launchd KeepAlive |
| BlobStore fallback | MinIO down → BlobStore writes/reads from FS mirror automatically |
| Token rotation | refresh_token rotated on use (one-time) |
| Token TTL | access 1 h, refresh 30 d, auth code 10 min |
| GC | expired sessions/codes purged on each provider call |
Security model
| Threat | Mitigation |
|---|---|
| Public MCP endpoint exposure | Local-only by default (localhost:8080); CF tunnel is opt-in for remote-device access. Auth required either way. |
| Token theft | TLS only (CF-managed cert). Tokens random 32-byte urlsafe. PBKDF2 200K rounds for password. |
| Replay attack | Auth codes single-use (consumed on exchange). Refresh tokens rotated. |
| Data exfil from compromised host | Postgres + MinIO bound to localhost; egress requires explicit tunnel. |
| Brute-force password | Single-user; CF rate-limit upstream provides some protection. TODO: add rate limit on /login. |
| Sensitive content | Embedding is irreversible (one-way); original text only retrievable via authenticated MCP. _secrets workspace has no MCP tool — only the vault flow accesses it. |
| Workspace bleed | Scoped tools enforce workspace at SQL level; client filter for LL further restricts to tenant + _generic + _multi. |
Reproducibility — quickstart for a forker
# 1. Install Postgres 16 + pgvector
brew install postgresql@16
psql -d postgres -c "CREATE DATABASE ragkb;"
psql -d ragkb -c "CREATE EXTENSION vector;"
# 2. Install MinIO + start
brew install minio/stable/minio
minio server ~/minio-data &
mc alias set local http://localhost:9000 minioadmin minioadmin
mc mb local/tako-kb
# 3. Clone and bootstrap the server
cd ~/Documents/Side.Projects/
git clone <your-fork>/tako && cd tako
python3.11 -m venv venv
./venv/bin/pip install -r requirements.txt # mcp[cli], starlette, uvicorn,
# asyncpg, pgvector, sentence-transformers,
# boto3, fastmcp
# 4. Apply schema (see schema section above)
psql -d ragkb -f schema.sql
# 5. Generate secrets
openssl rand -hex 32 > server/.token
openssl rand -base64 24 | tee /tmp/pw # → PBKDF2 → server/.oauth_env
# 6. Install launchd plists
cp launchd/*.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ai.tako.*
# 7. Ingest your existing KB
find ~/Documents/KB-s3/ -name "*.md" | \
xargs -n1 python3 hooks/kb-ingest-file.py
# 8. (Optional) Cloudflare tunnel for remote-device access
cloudflared tunnel create personal-rag
cloudflared tunnel route dns <tunnel-id> rag.yourdomain.com
# add ingress in ~/.cloudflared/config.yml → http://localhost:8080
# 9. Add custom connector on claude.ai with URL https://rag.yourdomain.com/mcp
Total: 1–2 hours if Postgres + MinIO + a domain are ready.
Future work
- Rate limit on
/login(5 req/min per IP) - TOTP 2FA via
pyotp - Hybrid BM25 + vector for keyword-heavy queries (e.g. issue IDs, product codes)
- Per-workspace embedder selection (e.g. code-specific for
_canonarxiv subset)
License & attribution
Personal project. Built on:
- Model Context Protocol by Anthropic
- bge-m3 + bge-reranker-v2-m3 by BAAI
- pgvector
- MinIO
- Cloudflare Tunnel free tier