Implementation

Sister docs: PRD (intent), Architecture (system view), Notes (decision log).

TL;DR

A production-ready personal RAG system in continuous personal use:

42,000+ sources / 182,000+ chunks across 6 workspaces (mixed Vietnamese + English text)
Postgres 16 + pgvector with HNSW; 3.2 GB on disk
bge-m3 multilingual embedder (1024-dim) + bge-reranker-v2-m3 cross-encoder rerank
MCP Streamable HTTP server (FastMCP) with OAuth 2.0 (PKCE + DCR) — accessible from Claude Desktop, Claude.ai web, Claude iOS app
Workspace-scoped tools: kb_search_ll, kb_search_mindx, kb_search_personal, kb_search_shared, kb_search_canon
Latency: p95 840 ms warm / 2.3 s cold end-to-end
Resource: 388 MB idle / 501 MB active on MacBook Pro M2 Max
S3-native storage (since 2026-05-24): MinIO BlobStore primary + filesystem mirror fallback; dual-scheme URIs (s3:// + file://)
Auto-ingest: launchd tako-mount-watcher watches the KB mount path via fswatch
Backups: launchd tako-pg-backup (daily) + tako-fs-backup (hourly)

Stack

Layer	Component	Version / Notes
Compute	MacBook Pro M2 Max	local; daemon footprint 388 MB idle / 501 MB active
OS	macOS	launchd-managed jobs
Runtime	Python	3.11 + venv
MCP SDK	`mcp`	Streamable HTTP transport, built-in OAuth scaffolding
HTTP server	uvicorn + Starlette	`127.0.0.1:8080` (tunnel-fronted when remote access needed)
Embedding	`bge-m3` via `sentence-transformers`	1024-dim, multilingual (incl. Vietnamese), MPS-accelerated
Reranker	`bge-reranker-v2-m3`	cross-encoder, top-N rerank stage
Vector DB	Postgres 16 + `pgvector`	HNSW index, cosine distance
Driver	`asyncpg`	local socket / TCP
Blob store	MinIO (S3-compatible)	primary; FS mirror fallback
Mirror tool	rclone	hourly S3 → FS sync
Tunnel	cloudflared	named tunnel; optional, only for remote-device access
Auth	OAuth 2.0 + PBKDF2 password	sha256 200K iters; DCR-enabled, refresh rotation
Process manager	launchd	`ai.tako.mcp`, `ai.tako.mount-watcher`, `ai.tako.pg-backup`, `ai.tako.fs-backup`, `ai.tako.cloudflared`

The internal package name in code remains tako; this doc uses the product name Personal-RAG for clarity.

Directory layout

Server (`~/Documents/Side.Projects/tako/server/`)

src/
├── server_local.py            # FastMCP wrapper, Starlette routes
├── oauth_provider.py          # OAuthAuthorizationServerProvider impl
├── oauth_login.py             # /login GET + POST routes
├── blobstore.py               # BlobStore abstraction (MinIO + FS fallback)
├── workspaces.py              # WORKSPACE_MAP + scoped tool factory
├── bulk_migrate.py            # bulk re-ingest helper
├── re_embed.py                # model swap retool
├── backup.py                  # pg_dump driver
└── restore.py                 # disaster recovery driver

lib/
├── instructions.py            # MCP server orchestration playbook (sent in serverInfo)
└── classifiers.py             # path → workspace + source_type rules

Operator host (`~/.claude/`)

hooks/
├── save-convo.py              # Stop hook → archive AI session + auto-ingest
├── kb-ingest-file.py          # Generic file → kb_ingest helper (workspace-aware)
└── kb-ingest-file.log

launchd/
├── ai.tako.mcp.plist
├── ai.tako.mount-watcher.plist
├── ai.tako.pg-backup.plist
├── ai.tako.fs-backup.plist
└── ai.tako.cloudflared.plist

KB mount (`~/Documents/KB-s3/`)

KB-s3/
├── ll/         # work — partitioned by client (pcf/, newlife/, ilham/, bbl/, _generic/, _multi/)
├── mindx/      # consulting
├── _personal/  # side projects, memory, finance, health
├── _shared/    # cross-cutting research
├── _canon/     # vendor authoritative docs (Anthropic/xAI/HF/arxiv)
└── _secrets/   # encrypted vault — NOT exposed via MCP

The mount path = canonical write target post-S3 migration. Writes under the mount get URI s3://tako-kb/<key>; legacy paths outside the mount keep file://<abs> URIs (still indexed for backward compat).

Schema

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE kb_sources (
    id            BIGSERIAL PRIMARY KEY,
    workspace     TEXT NOT NULL,          -- 'll' | 'mindx' | '_personal' | '_shared' | '_canon' | '_secrets'
    client        TEXT,                   -- 'PCF' | 'NewLife' | 'Ilham' | 'BBL' | '_generic' | '_multi' | NULL
    source_uri    TEXT NOT NULL,          -- s3://tako-kb/... or file://...
    source_type   TEXT,
    title         TEXT,
    tags          TEXT,                   -- comma-sep
    metadata      JSONB,
    content_hash  TEXT,                   -- sha256, idempotency key
    supersedes    BIGINT REFERENCES kb_sources(id),
    archived_at   TIMESTAMPTZ,
    ingested_at   TIMESTAMPTZ DEFAULT now(),
    updated_at    TIMESTAMPTZ DEFAULT now(),
    CONSTRAINT uq_kb_sources_uri UNIQUE (source_uri)
);
CREATE INDEX idx_kb_sources_ws_client ON kb_sources(workspace, client);
CREATE INDEX idx_kb_sources_type      ON kb_sources(source_type);
CREATE INDEX idx_kb_sources_ingested  ON kb_sources(ingested_at);

CREATE TABLE kb_chunks (
    id          BIGSERIAL PRIMARY KEY,
    source_id   BIGINT NOT NULL REFERENCES kb_sources(id) ON DELETE CASCADE,
    chunk_idx   INT NOT NULL,
    text        TEXT NOT NULL,
    embedding   vector(1024),             -- bge-m3
    CONSTRAINT uq_kb_chunks_src_idx UNIQUE (source_id, chunk_idx)
);
CREATE INDEX idx_kb_chunks_source ON kb_chunks(source_id);

CREATE INDEX kb_emb_hnsw ON kb_chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Why this shape:

(workspace, client) index → scoped tools filter cheaply
source_uri UNIQUE + content_hash → ingest is idempotent
supersedes + archived_at → audit trail for evolving docs (e.g. PRD v1 → v2)
HNSW with vector_cosine_ops → fast ANN; rebuild after bulk loads
ON DELETE CASCADE → drop source → chunks auto-removed

MCP tools surface

Tool	Args	Purpose
`kb_health`	—	DB ping + sources/chunks count
`kb_ingest`	text, source_uri, source_type, title, tags[], metadata{}, workspace	Idempotent insert/update
`kb_search_ll`	query, client_filter, top_k=5, tag_filter, source_type	Work-workspace search with multi-tenant filter
`kb_search_mindx`	query, top_k=5, …	MindX consulting workspace
`kb_search_personal`	query, top_k=5, …	Personal workspace
`kb_search_shared`	query, top_k=5, …	Shared research notes
`kb_search_canon`	query, top_k=5, …	External vendor authoritative docs
`kb_search_all`	query, top_k=5, …	Cross-workspace (use sparingly)
`kb_stats`	—	Total + breakdown by workspace / source_type

Each tool ≈ 30–60 LOC. FastMCP @mcp.tool() decorator handles JSON Schema generation, parsing, response packaging.

Chunking strategy

Token-window chunking with overlap, sized to fit comfortably under bge-m3’s 8192-token max sequence while keeping per-chunk semantic density high. A safety cap on chunks-per-file prevents extremely long transcripts (100K+ tokens) from blowing up storage; head-only embedding covers the first ~50 windows which captures intent and key decisions.

Embedding pipeline (bge-m3)

bge-m3 doesn’t require explicit task prefixes (unlike e5), but the embedder is invoked with batched encoding for throughput. Cross-encoder rerank runs on the top-N (default 20) candidates from the ANN stage and reorders by bge-reranker-v2-m3 relevance score before returning top-K.

Measured retrieval quality (93-query held-out personal eval, 2026-05-21):

Stage	Hit@1	Hit@3	MRR
bge-m3 only	86.0%	97.8%	0.918
bge-m3 + bge-reranker-v2-m3	89.2%	97.8%	0.948

The reranker swap was shipped as default for the marginal Hit@1 lift.

Auth dual-mode

# oauth_provider.py: load_access_token() falls back to legacy bearer
async def load_access_token(self, token: str) -> AccessToken | None:
    if secrets.compare_digest(token, LEGACY_BEARER):
        return AccessToken(token=token, client_id="_legacy_bearer", scopes=["mcp"])
    state = _load_state()
    return AccessToken.model_validate(state["access_tokens"][token]) if token in state["access_tokens"] else None

This allows:

Claude Desktop (mcp-remote bridge → localhost:8080 direct, bearer in header) — primary
Claude.ai web (full OAuth flow with DCR + PKCE, via Cloudflare tunnel)
Claude iOS app — inherits from claude.ai web account
curl scripts on host — bearer for ad-hoc admin

OAuth flow (sequence)

sequenceDiagram
    autonumber
    participant C as Claude.ai
    participant B as Browser
    participant S as MCP Server
    participant DB as State Store

    C->>S: GET /.well-known/oauth-authorization-server
    S-->>C: AS metadata (RFC 8414)

    C->>S: POST /register (DCR)
    S->>DB: Save client
    S-->>C: 201 {client_id}

    C->>B: Redirect to /authorize?
client_id, redirect_uri, code_challenge

    B->>S: GET /authorize
    S->>DB: Create session
    S-->>B: 302 → /login?session=sid

    B->>S: GET /login (password form)
    S-->>B: HTML form

    B->>S: POST /login (password)
    S->>S: PBKDF2 verify (200K iters)
    S->>DB: Generate auth code
    S-->>B: 302 → redirect_uri?code=...&state=...

    B->>C: Forwards code

    C->>S: POST /token (code + verifier)
    S->>S: Verify PKCE challenge
    S->>DB: Issue access_token + refresh_token
    S-->>C: 200 {access_token, refresh_token, expires_in: 3600}

    Note over C,S: Subsequent MCP calls
    C->>S: POST /mcp + Authorization: Bearer
    S->>DB: Validate token
    S-->>C: MCP response

    Note over C,S: After 1 hour
    C->>S: POST /token (grant_type=refresh_token)
    S->>DB: Rotate refresh + new access
    S-->>C: 200 {new tokens}

Ingest flow (filesystem-first)

[Skill / Hook / Crawler]
        │ writes
        ▼
<kb-mount>/<workspace>/<folder>/<slug>.md     ← canonical, source-of-truth
        │
        │ fswatch event → tako-mount-watcher launchd job
        │ exec python3 <hooks>/kb-ingest-file.py <path>
        ▼
[kb-ingest-file.py]
        │ reads file, classify workspace + source_type from path, auto-tag
        │ HTTPS POST localhost:8080/mcp tools/call kb_ingest
        ▼
[Server: kb_ingest tool]
        │ SHA-256(content), check kb_sources.content_hash
        │   if match → return {skipped: true}
        │   if differ → DELETE old chunks, recompute
        │ chunk(text), bge-m3 encode (1024-dim)
        │ INSERT kb_sources (or UPDATE), INSERT kb_chunks bulk
        │ persist blob via BlobStore (s3:// primary, file:// fallback)
        ▼
Postgres + MinIO

Why filesystem-first: the local filesystem (with Time Machine + hourly MinIO mirror) is durable. Postgres + MinIO are treated as derived indexes — both re-buildable from filesystem. Worst case = re-embed cost (~hours), no data loss.

Performance numbers

Measured on MacBook Pro M2 Max (MPS-accelerated embedder):

Operation	Number	Notes
Daemon memory — idle	388 MB	model loaded, no inflight
Daemon memory — active	501 MB	during search
Postgres DB size	3.2 GB	sources + chunks + indexes
Sources indexed	42,000+	across 6 workspaces
Chunks indexed	182,000+	bge-m3 1024-dim
`kb_search_*` p95 (warm)	840 ms	embed + ANN + rerank + tunnel-or-local
`kb_search_*` p95 (cold)	2.3 s	includes model load
`kb_health`	<80 ms	DB ping
Hit@3 on held-out eval	97.8%	93 personal-workspace queries
MRR on held-out eval	0.948	with reranker
Post-S3 regression test	100% parity	1000-row sample, 2026-05-24

Reliability features

Feature	How
Idempotent ingest	`content_hash` compare → skip duplicate uploads
Filesystem source-of-truth	re-buildable from `~/Documents/KB-s3/`
Real-time auto-ingest	`tako-mount-watcher` launchd + fswatch
Hourly blob mirror	`tako-fs-backup` rclone MinIO → FS
Daily Postgres dump	`tako-pg-backup`
Auto-restart MCP server	launchd `KeepAlive`
Auto-restart tunnel	launchd `KeepAlive`
BlobStore fallback	MinIO down → BlobStore writes/reads from FS mirror automatically
Token rotation	refresh_token rotated on use (one-time)
Token TTL	access 1 h, refresh 30 d, auth code 10 min
GC	expired sessions/codes purged on each provider call

Security model

Threat	Mitigation
Public MCP endpoint exposure	Local-only by default (`localhost:8080`); CF tunnel is opt-in for remote-device access. Auth required either way.
Token theft	TLS only (CF-managed cert). Tokens random 32-byte urlsafe. PBKDF2 200K rounds for password.
Replay attack	Auth codes single-use (consumed on exchange). Refresh tokens rotated.
Data exfil from compromised host	Postgres + MinIO bound to localhost; egress requires explicit tunnel.
Brute-force password	Single-user; CF rate-limit upstream provides some protection. TODO: add rate limit on `/login`.
Sensitive content	Embedding is irreversible (one-way); original text only retrievable via authenticated MCP. `_secrets` workspace has no MCP tool — only the vault flow accesses it.
Workspace bleed	Scoped tools enforce workspace at SQL level; client filter for LL further restricts to tenant + `_generic` + `_multi`.

Reproducibility — quickstart for a forker

# 1. Install Postgres 16 + pgvector
brew install postgresql@16
psql -d postgres -c "CREATE DATABASE ragkb;"
psql -d ragkb -c "CREATE EXTENSION vector;"

# 2. Install MinIO + start
brew install minio/stable/minio
minio server ~/minio-data &
mc alias set local http://localhost:9000 minioadmin minioadmin
mc mb local/tako-kb

# 3. Clone and bootstrap the server
cd ~/Documents/Side.Projects/
git clone <your-fork>/tako && cd tako
python3.11 -m venv venv
./venv/bin/pip install -r requirements.txt   # mcp[cli], starlette, uvicorn,
                                             # asyncpg, pgvector, sentence-transformers,
                                             # boto3, fastmcp

# 4. Apply schema (see schema section above)
psql -d ragkb -f schema.sql

# 5. Generate secrets
openssl rand -hex 32 > server/.token
openssl rand -base64 24 | tee /tmp/pw            # → PBKDF2 → server/.oauth_env

# 6. Install launchd plists
cp launchd/*.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ai.tako.*

# 7. Ingest your existing KB
find ~/Documents/KB-s3/ -name "*.md" | \
  xargs -n1 python3 hooks/kb-ingest-file.py

# 8. (Optional) Cloudflare tunnel for remote-device access
cloudflared tunnel create personal-rag
cloudflared tunnel route dns <tunnel-id> rag.yourdomain.com
# add ingress in ~/.cloudflared/config.yml → http://localhost:8080

# 9. Add custom connector on claude.ai with URL https://rag.yourdomain.com/mcp

Total: 1–2 hours if Postgres + MinIO + a domain are ready.

Future work

Rate limit on /login (5 req/min per IP)
TOTP 2FA via pyotp
Hybrid BM25 + vector for keyword-heavy queries (e.g. issue IDs, product codes)
Per-workspace embedder selection (e.g. code-specific for _canon arxiv subset)

License & attribution

Personal project. Built on:

Personal-RAG — Implementation