← Back to project
● M1 done P0 Size S Foundation

Personal-RAG — Implementation

Tech stack deep-dive: schema, OAuth flow, performance numbers, security model, reproducibility.

Implementation

Sister docs: PRD (intent), Architecture (system view), Notes (decision log).

TL;DR

A production-ready personal RAG system in continuous personal use:

  • 42,000+ sources / 182,000+ chunks across 6 workspaces (mixed Vietnamese + English text)
  • Postgres 16 + pgvector with HNSW; 3.2 GB on disk
  • bge-m3 multilingual embedder (1024-dim) + bge-reranker-v2-m3 cross-encoder rerank
  • MCP Streamable HTTP server (FastMCP) with OAuth 2.0 (PKCE + DCR) — accessible from Claude Desktop, Claude.ai web, Claude iOS app
  • Workspace-scoped tools: kb_search_ll, kb_search_mindx, kb_search_personal, kb_search_shared, kb_search_canon
  • Latency: p95 840 ms warm / 2.3 s cold end-to-end
  • Resource: 388 MB idle / 501 MB active on MacBook Pro M2 Max
  • S3-native storage (since 2026-05-24): MinIO BlobStore primary + filesystem mirror fallback; dual-scheme URIs (s3:// + file://)
  • Auto-ingest: launchd tako-mount-watcher watches the KB mount path via fswatch
  • Backups: launchd tako-pg-backup (daily) + tako-fs-backup (hourly)

Stack

LayerComponentVersion / Notes
ComputeMacBook Pro M2 Maxlocal; daemon footprint 388 MB idle / 501 MB active
OSmacOSlaunchd-managed jobs
RuntimePython3.11 + venv
MCP SDKmcpStreamable HTTP transport, built-in OAuth scaffolding
HTTP serveruvicorn + Starlette127.0.0.1:8080 (tunnel-fronted when remote access needed)
Embeddingbge-m3 via sentence-transformers1024-dim, multilingual (incl. Vietnamese), MPS-accelerated
Rerankerbge-reranker-v2-m3cross-encoder, top-N rerank stage
Vector DBPostgres 16 + pgvectorHNSW index, cosine distance
Driverasyncpglocal socket / TCP
Blob storeMinIO (S3-compatible)primary; FS mirror fallback
Mirror toolrclonehourly S3 → FS sync
Tunnelcloudflarednamed tunnel; optional, only for remote-device access
AuthOAuth 2.0 + PBKDF2 passwordsha256 200K iters; DCR-enabled, refresh rotation
Process managerlaunchdai.tako.mcp, ai.tako.mount-watcher, ai.tako.pg-backup, ai.tako.fs-backup, ai.tako.cloudflared

The internal package name in code remains tako; this doc uses the product name Personal-RAG for clarity.

Directory layout

Server (~/Documents/Side.Projects/tako/server/)

src/
├── server_local.py            # FastMCP wrapper, Starlette routes
├── oauth_provider.py          # OAuthAuthorizationServerProvider impl
├── oauth_login.py             # /login GET + POST routes
├── blobstore.py               # BlobStore abstraction (MinIO + FS fallback)
├── workspaces.py              # WORKSPACE_MAP + scoped tool factory
├── bulk_migrate.py            # bulk re-ingest helper
├── re_embed.py                # model swap retool
├── backup.py                  # pg_dump driver
└── restore.py                 # disaster recovery driver

lib/
├── instructions.py            # MCP server orchestration playbook (sent in serverInfo)
└── classifiers.py             # path → workspace + source_type rules

Operator host (~/.claude/)

hooks/
├── save-convo.py              # Stop hook → archive AI session + auto-ingest
├── kb-ingest-file.py          # Generic file → kb_ingest helper (workspace-aware)
└── kb-ingest-file.log

launchd/
├── ai.tako.mcp.plist
├── ai.tako.mount-watcher.plist
├── ai.tako.pg-backup.plist
├── ai.tako.fs-backup.plist
└── ai.tako.cloudflared.plist

KB mount (~/Documents/KB-s3/)

KB-s3/
├── ll/         # work — partitioned by client (pcf/, newlife/, ilham/, bbl/, _generic/, _multi/)
├── mindx/      # consulting
├── _personal/  # side projects, memory, finance, health
├── _shared/    # cross-cutting research
├── _canon/     # vendor authoritative docs (Anthropic/xAI/HF/arxiv)
└── _secrets/   # encrypted vault — NOT exposed via MCP

The mount path = canonical write target post-S3 migration. Writes under the mount get URI s3://tako-kb/<key>; legacy paths outside the mount keep file://<abs> URIs (still indexed for backward compat).

Schema

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE kb_sources (
    id            BIGSERIAL PRIMARY KEY,
    workspace     TEXT NOT NULL,          -- 'll' | 'mindx' | '_personal' | '_shared' | '_canon' | '_secrets'
    client        TEXT,                   -- 'PCF' | 'NewLife' | 'Ilham' | 'BBL' | '_generic' | '_multi' | NULL
    source_uri    TEXT NOT NULL,          -- s3://tako-kb/... or file://...
    source_type   TEXT,
    title         TEXT,
    tags          TEXT,                   -- comma-sep
    metadata      JSONB,
    content_hash  TEXT,                   -- sha256, idempotency key
    supersedes    BIGINT REFERENCES kb_sources(id),
    archived_at   TIMESTAMPTZ,
    ingested_at   TIMESTAMPTZ DEFAULT now(),
    updated_at    TIMESTAMPTZ DEFAULT now(),
    CONSTRAINT uq_kb_sources_uri UNIQUE (source_uri)
);
CREATE INDEX idx_kb_sources_ws_client ON kb_sources(workspace, client);
CREATE INDEX idx_kb_sources_type      ON kb_sources(source_type);
CREATE INDEX idx_kb_sources_ingested  ON kb_sources(ingested_at);

CREATE TABLE kb_chunks (
    id          BIGSERIAL PRIMARY KEY,
    source_id   BIGINT NOT NULL REFERENCES kb_sources(id) ON DELETE CASCADE,
    chunk_idx   INT NOT NULL,
    text        TEXT NOT NULL,
    embedding   vector(1024),             -- bge-m3
    CONSTRAINT uq_kb_chunks_src_idx UNIQUE (source_id, chunk_idx)
);
CREATE INDEX idx_kb_chunks_source ON kb_chunks(source_id);

CREATE INDEX kb_emb_hnsw ON kb_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Why this shape:

  • (workspace, client) index → scoped tools filter cheaply
  • source_uri UNIQUE + content_hash → ingest is idempotent
  • supersedes + archived_at → audit trail for evolving docs (e.g. PRD v1 → v2)
  • HNSW with vector_cosine_ops → fast ANN; rebuild after bulk loads
  • ON DELETE CASCADE → drop source → chunks auto-removed

MCP tools surface

ToolArgsPurpose
kb_healthDB ping + sources/chunks count
kb_ingesttext, source_uri, source_type, title, tags[], metadata{}, workspaceIdempotent insert/update
kb_search_llquery, client_filter, top_k=5, tag_filter, source_typeWork-workspace search with multi-tenant filter
kb_search_mindxquery, top_k=5, …MindX consulting workspace
kb_search_personalquery, top_k=5, …Personal workspace
kb_search_sharedquery, top_k=5, …Shared research notes
kb_search_canonquery, top_k=5, …External vendor authoritative docs
kb_search_allquery, top_k=5, …Cross-workspace (use sparingly)
kb_statsTotal + breakdown by workspace / source_type

Each tool ≈ 30–60 LOC. FastMCP @mcp.tool() decorator handles JSON Schema generation, parsing, response packaging.

Chunking strategy

Token-window chunking with overlap, sized to fit comfortably under bge-m3’s 8192-token max sequence while keeping per-chunk semantic density high. A safety cap on chunks-per-file prevents extremely long transcripts (100K+ tokens) from blowing up storage; head-only embedding covers the first ~50 windows which captures intent and key decisions.

Embedding pipeline (bge-m3)

bge-m3 doesn’t require explicit task prefixes (unlike e5), but the embedder is invoked with batched encoding for throughput. Cross-encoder rerank runs on the top-N (default 20) candidates from the ANN stage and reorders by bge-reranker-v2-m3 relevance score before returning top-K.

Measured retrieval quality (93-query held-out personal eval, 2026-05-21):

StageHit@1Hit@3MRR
bge-m3 only86.0%97.8%0.918
bge-m3 + bge-reranker-v2-m389.2%97.8%0.948

The reranker swap was shipped as default for the marginal Hit@1 lift.

Auth dual-mode

# oauth_provider.py: load_access_token() falls back to legacy bearer
async def load_access_token(self, token: str) -> AccessToken | None:
    if secrets.compare_digest(token, LEGACY_BEARER):
        return AccessToken(token=token, client_id="_legacy_bearer", scopes=["mcp"])
    state = _load_state()
    return AccessToken.model_validate(state["access_tokens"][token]) if token in state["access_tokens"] else None

This allows:

  • Claude Desktop (mcp-remote bridge → localhost:8080 direct, bearer in header) — primary
  • Claude.ai web (full OAuth flow with DCR + PKCE, via Cloudflare tunnel)
  • Claude iOS app — inherits from claude.ai web account
  • curl scripts on host — bearer for ad-hoc admin

OAuth flow (sequence)

sequenceDiagram
    autonumber
    participant C as Claude.ai
    participant B as Browser
    participant S as MCP Server
    participant DB as State Store

    C->>S: GET /.well-known/oauth-authorization-server
    S-->>C: AS metadata (RFC 8414)

    C->>S: POST /register (DCR)
    S->>DB: Save client
    S-->>C: 201 {client_id}

    C->>B: Redirect to /authorize?
client_id, redirect_uri, code_challenge B->>S: GET /authorize S->>DB: Create session S-->>B: 302 → /login?session=sid B->>S: GET /login (password form) S-->>B: HTML form B->>S: POST /login (password) S->>S: PBKDF2 verify (200K iters) S->>DB: Generate auth code S-->>B: 302 → redirect_uri?code=...&state=... B->>C: Forwards code C->>S: POST /token (code + verifier) S->>S: Verify PKCE challenge S->>DB: Issue access_token + refresh_token S-->>C: 200 {access_token, refresh_token, expires_in: 3600} Note over C,S: Subsequent MCP calls C->>S: POST /mcp + Authorization: Bearer S->>DB: Validate token S-->>C: MCP response Note over C,S: After 1 hour C->>S: POST /token (grant_type=refresh_token) S->>DB: Rotate refresh + new access S-->>C: 200 {new tokens}

Ingest flow (filesystem-first)

[Skill / Hook / Crawler]
        │ writes

<kb-mount>/<workspace>/<folder>/<slug>.md     ← canonical, source-of-truth

        │ fswatch event → tako-mount-watcher launchd job
        │ exec python3 <hooks>/kb-ingest-file.py <path>

[kb-ingest-file.py]
        │ reads file, classify workspace + source_type from path, auto-tag
        │ HTTPS POST localhost:8080/mcp tools/call kb_ingest

[Server: kb_ingest tool]
        │ SHA-256(content), check kb_sources.content_hash
        │   if match → return {skipped: true}
        │   if differ → DELETE old chunks, recompute
        │ chunk(text), bge-m3 encode (1024-dim)
        │ INSERT kb_sources (or UPDATE), INSERT kb_chunks bulk
        │ persist blob via BlobStore (s3:// primary, file:// fallback)

Postgres + MinIO

Why filesystem-first: the local filesystem (with Time Machine + hourly MinIO mirror) is durable. Postgres + MinIO are treated as derived indexes — both re-buildable from filesystem. Worst case = re-embed cost (~hours), no data loss.

Performance numbers

Measured on MacBook Pro M2 Max (MPS-accelerated embedder):

OperationNumberNotes
Daemon memory — idle388 MBmodel loaded, no inflight
Daemon memory — active501 MBduring search
Postgres DB size3.2 GBsources + chunks + indexes
Sources indexed42,000+across 6 workspaces
Chunks indexed182,000+bge-m3 1024-dim
kb_search_* p95 (warm)840 msembed + ANN + rerank + tunnel-or-local
kb_search_* p95 (cold)2.3 sincludes model load
kb_health<80 msDB ping
Hit@3 on held-out eval97.8%93 personal-workspace queries
MRR on held-out eval0.948with reranker
Post-S3 regression test100% parity1000-row sample, 2026-05-24

Reliability features

FeatureHow
Idempotent ingestcontent_hash compare → skip duplicate uploads
Filesystem source-of-truthre-buildable from ~/Documents/KB-s3/
Real-time auto-ingesttako-mount-watcher launchd + fswatch
Hourly blob mirrortako-fs-backup rclone MinIO → FS
Daily Postgres dumptako-pg-backup
Auto-restart MCP serverlaunchd KeepAlive
Auto-restart tunnellaunchd KeepAlive
BlobStore fallbackMinIO down → BlobStore writes/reads from FS mirror automatically
Token rotationrefresh_token rotated on use (one-time)
Token TTLaccess 1 h, refresh 30 d, auth code 10 min
GCexpired sessions/codes purged on each provider call

Security model

ThreatMitigation
Public MCP endpoint exposureLocal-only by default (localhost:8080); CF tunnel is opt-in for remote-device access. Auth required either way.
Token theftTLS only (CF-managed cert). Tokens random 32-byte urlsafe. PBKDF2 200K rounds for password.
Replay attackAuth codes single-use (consumed on exchange). Refresh tokens rotated.
Data exfil from compromised hostPostgres + MinIO bound to localhost; egress requires explicit tunnel.
Brute-force passwordSingle-user; CF rate-limit upstream provides some protection. TODO: add rate limit on /login.
Sensitive contentEmbedding is irreversible (one-way); original text only retrievable via authenticated MCP. _secrets workspace has no MCP tool — only the vault flow accesses it.
Workspace bleedScoped tools enforce workspace at SQL level; client filter for LL further restricts to tenant + _generic + _multi.

Reproducibility — quickstart for a forker

# 1. Install Postgres 16 + pgvector
brew install postgresql@16
psql -d postgres -c "CREATE DATABASE ragkb;"
psql -d ragkb -c "CREATE EXTENSION vector;"

# 2. Install MinIO + start
brew install minio/stable/minio
minio server ~/minio-data &
mc alias set local http://localhost:9000 minioadmin minioadmin
mc mb local/tako-kb

# 3. Clone and bootstrap the server
cd ~/Documents/Side.Projects/
git clone <your-fork>/tako && cd tako
python3.11 -m venv venv
./venv/bin/pip install -r requirements.txt   # mcp[cli], starlette, uvicorn,
                                             # asyncpg, pgvector, sentence-transformers,
                                             # boto3, fastmcp

# 4. Apply schema (see schema section above)
psql -d ragkb -f schema.sql

# 5. Generate secrets
openssl rand -hex 32 > server/.token
openssl rand -base64 24 | tee /tmp/pw            # → PBKDF2 → server/.oauth_env

# 6. Install launchd plists
cp launchd/*.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ai.tako.*

# 7. Ingest your existing KB
find ~/Documents/KB-s3/ -name "*.md" | \
  xargs -n1 python3 hooks/kb-ingest-file.py

# 8. (Optional) Cloudflare tunnel for remote-device access
cloudflared tunnel create personal-rag
cloudflared tunnel route dns <tunnel-id> rag.yourdomain.com
# add ingress in ~/.cloudflared/config.yml → http://localhost:8080

# 9. Add custom connector on claude.ai with URL https://rag.yourdomain.com/mcp

Total: 1–2 hours if Postgres + MinIO + a domain are ready.

Future work

  • Rate limit on /login (5 req/min per IP)
  • TOTP 2FA via pyotp
  • Hybrid BM25 + vector for keyword-heavy queries (e.g. issue IDs, product codes)
  • Per-workspace embedder selection (e.g. code-specific for _canon arxiv subset)

License & attribution

Personal project. Built on: