🎯 M1 DONE 🧠 AI / RAG FOUNDATION 🔌 MCP

Personal RAG
Knowledge Base

Vector Search · Multilingual NLP · MCP Servers · OAuth 2.0

A production multi-workspace personal RAG system running locally on a MacBook Pro M2 Max. 42,000+ knowledge documents across 6 workspaces indexed with multilingual semantic search + reranker (VN + EN + code-mixed), accessible from Claude Desktop, Claude.ai web, and iOS app via the Model Context Protocol. Foundation pattern reused across 9 downstream native AI products (Eval-Framework, Knowledge-Audit, Mail-Assistant, AI-Canon-Crawler, and others).

Performance · Local M2 Max
42K+ Sources
182K+ Chunks
840ms p95 warm latency
97.8% Hit@3 (held-out)
6 Workspaces · 9 downstream products reuse this foundation
★ Personal RAG · S3-native · 2026 ★
🤖

AI INFRASTRUCTURE

Production-ready multi-workspace vector RAG system over 42,000+ knowledge documents across 6 workspaces. Multilingual semantic search (Vietnamese + English + code-mixed) accessible from Claude Desktop, web, and iOS app via the Model Context Protocol.

PERFORMANCE

p95 latency 840 ms warm / 2.3 s cold. Hit@3 = 97.8%, MRR = 0.948 on a held-out personal eval set. 182K+ chunks indexed with pgvector HNSW on Postgres 16. 388 MB idle / 501 MB active on MacBook Pro M2 Max.

🛡️

SECURITY & RELIABILITY

OAuth 2.0 (PKCE + DCR) with legacy bearer fallback. S3-native storage on MinIO BlobStore with filesystem mirror fallback — the mount path is canonical, vector DB is a rebuildable derived index. Daily Postgres backup + hourly S3→FS mirror via launchd.

🚀

PRODUCT DECISIONS

Pivoted Telegram bot → MCP server (multi-client for free). Pivoted embedding model BGE-en → multilingual-e5-small → bge-m3 + bge-reranker-v2-m3 after multilingual benchmarking. Pivoted Oracle ADB cloud → local Postgres + pgvector + MinIO S3 (no cloud quota risk). Workspace-scoped tools enable trust-tier routing.

★ Built different ★

About this project

Personal RAG started as a way to stop losing my own knowledge. It grew into the foundation pattern I now reuse across every production native AI product I ship. The hot path (chunk → embed → vector ANN → rerank → MCP) is identical whether the dataset is 42K personal docs or 500K enterprise docs. The deltas are around identity, isolation, trust-tier routing, and ingest velocity — not retrieval.

📚

STACK

  • Python 3.11
  • MCP SDK + FastMCP
  • Postgres 16 + pgvector
  • MinIO S3 (BlobStore)
  • bge-m3 + bge-reranker-v2-m3
  • Cloudflare Tunnel
  • OAuth 2.0 + launchd
📈

METRICS

  • 42,000+ sources indexed
  • 182,000+ chunks
  • p95 840 ms warm / 2.3 s cold
  • Hit@3 = 97.8% · MRR 0.948
  • 388 MB idle / 501 MB active
🌱

WHAT I LEARNED

  • Trust tier is a product concept, not engineering
  • Workspaces > tags when routing happens before retrieval
  • Reranker beats raw embedder swap on Hit@1
  • OAuth DCR enables claude.ai web custom connectors
  • Mount-path canonical > FS direct post-S3

Read the docs ↓