Vector Search · Multilingual NLP · MCP Servers · OAuth 2.0
A production multi-workspace personal RAG system running locally on a MacBook Pro M2 Max. 42,000+ knowledge documents across 6 workspaces indexed with multilingual semantic search + reranker (VN + EN + code-mixed), accessible from Claude Desktop, Claude.ai web, and iOS app via the Model Context Protocol. Foundation pattern reused across 9 downstream native AI products (Eval-Framework, Knowledge-Audit, Mail-Assistant, AI-Canon-Crawler, and others).
Production-ready multi-workspace vector RAG system over 42,000+ knowledge documents across 6 workspaces. Multilingual semantic search (Vietnamese + English + code-mixed) accessible from Claude Desktop, web, and iOS app via the Model Context Protocol.
p95 latency 840 ms warm / 2.3 s cold. Hit@3 = 97.8%, MRR = 0.948 on a held-out personal eval set. 182K+ chunks indexed with pgvector HNSW on Postgres 16. 388 MB idle / 501 MB active on MacBook Pro M2 Max.
OAuth 2.0 (PKCE + DCR) with legacy bearer fallback. S3-native storage on MinIO BlobStore with filesystem mirror fallback — the mount path is canonical, vector DB is a rebuildable derived index. Daily Postgres backup + hourly S3→FS mirror via launchd.
Pivoted Telegram bot → MCP server (multi-client for free). Pivoted embedding model BGE-en → multilingual-e5-small → bge-m3 + bge-reranker-v2-m3 after multilingual benchmarking. Pivoted Oracle ADB cloud → local Postgres + pgvector + MinIO S3 (no cloud quota risk). Workspace-scoped tools enable trust-tier routing.
Personal RAG started as a way to stop losing my own knowledge. It grew into the foundation pattern I now reuse across every production native AI product I ship. The hot path (chunk → embed → vector ANN → rerank → MCP) is identical whether the dataset is 42K personal docs or 500K enterprise docs. The deltas are around identity, isolation, trust-tier routing, and ingest velocity — not retrieval.