🎯 M1 DONE 🧠 AI / RAG ⚡ FOUNDATION 🔌 MCP

Personal RAG
Knowledge Base

Vector Search · Multilingual NLP · MCP Servers · OAuth 2.0

A production multi-workspace personal RAG system running locally on a MacBook Pro M2 Max. 42,000+ knowledge documents across 6 workspaces indexed with multilingual semantic search + reranker (VN + EN + code-mixed), accessible from Claude Desktop, Claude.ai web, and iOS app via the Model Context Protocol. Foundation pattern reused across 9 downstream native AI products (Eval-Framework, Knowledge-Audit, Mail-Assistant, AI-Canon-Crawler, and others).

Performance · Local M2 Max

42K+ Sources

182K+ Chunks

840ms p95 warm latency

97.8% Hit@3 (held-out)

6 Workspaces · 9 downstream products reuse this foundation

★ Personal RAG · S3-native · 2026 ★

🤖

AI INFRASTRUCTURE

Production-ready multi-workspace vector RAG system over 42,000+ knowledge documents across 6 workspaces. Multilingual semantic search (Vietnamese + English + code-mixed) accessible from Claude Desktop, web, and iOS app via the Model Context Protocol.

⚡

PERFORMANCE

p95 latency 840 ms warm / 2.3 s cold. Hit@3 = 97.8%, MRR = 0.948 on a held-out personal eval set. 182K+ chunks indexed with pgvector HNSW on Postgres 16. 388 MB idle / 501 MB active on MacBook Pro M2 Max.

🛡️

SECURITY & RELIABILITY

OAuth 2.0 (PKCE + DCR) with legacy bearer fallback. S3-native storage on MinIO BlobStore with filesystem mirror fallback — the mount path is canonical, vector DB is a rebuildable derived index. Daily Postgres backup + hourly S3→FS mirror via launchd.

🚀

PRODUCT DECISIONS

Pivoted Telegram bot → MCP server (multi-client for free). Pivoted embedding model BGE-en → multilingual-e5-small → bge-m3 + bge-reranker-v2-m3 after multilingual benchmarking. Pivoted Oracle ADB cloud → local Postgres + pgvector + MinIO S3 (no cloud quota risk). Workspace-scoped tools enable trust-tier routing.

★ Built different ★

About this project

Personal RAG started as a way to stop losing my own knowledge. It grew into the foundation pattern I now reuse across every production native AI product I ship. The hot path (chunk → embed → vector ANN → rerank → MCP) is identical whether the dataset is 42K personal docs or 500K enterprise docs. The deltas are around identity, isolation, trust-tier routing, and ingest velocity — not retrieval.

📚

STACK

Python 3.11
MCP SDK + FastMCP
Postgres 16 + pgvector
MinIO S3 (BlobStore)
bge-m3 + bge-reranker-v2-m3
Cloudflare Tunnel
OAuth 2.0 + launchd

📈

METRICS

42,000+ sources indexed
182,000+ chunks
p95 840 ms warm / 2.3 s cold
Hit@3 = 97.8% · MRR 0.948
388 MB idle / 501 MB active

🌱

WHAT I LEARNED

Trust tier is a product concept, not engineering
Workspaces > tags when routing happens before retrieval
Reranker beats raw embedder swap on Hit@1
OAuth DCR enables claude.ai web custom connectors
Mount-path canonical > FS direct post-S3

Personal RAG
Knowledge Base

AI INFRASTRUCTURE

PERFORMANCE

SECURITY & RELIABILITY

PRODUCT DECISIONS

About this project

STACK

METRICS

WHAT I LEARNED

Read the docs ↓

PRD

Architecture

Implementation

Notes

Enterprise

Personal RAG Knowledge Base

AI INFRASTRUCTURE

PERFORMANCE

SECURITY & RELIABILITY

PRODUCT DECISIONS

About this project

STACK

METRICS

WHAT I LEARNED

Read the docs ↓

PRD

Architecture

Implementation

Notes

Enterprise

Personal RAG
Knowledge Base