A cross-source audit layer that sits on top of Personal-RAG and continuously verifies that the corpus does not silently contradict itself. Built after the AI confidently asserted “we run on Oracle Cloud A1 VM” — infrastructure that was never registered.
At a glance
- 3-layer audit: (1) intra-file consistency, (2) cross-file contradiction within a workspace, (3) cross-workspace contradiction
- 4-tier cron: daily (light, 4B + Haiku 4.5 verifier) · weekly (8B + Haiku 4.5) · monthly (32B alone, deeper) · event-triggered (post-commit hook on the KB-s3 mount)
- Runs locally on MacBook Pro M2 Max via launchd — not in the cloud
- Caught 25+ stale facts in a 79-file corpus on day-one deploy
- Production swap 2026-05-23: Haiku verifier → Grok 4.3. Real accuracy 80% (strict-match scorer) → 99% (LLM-judged real) after an Eval-Framework bake-off
- Cost: ~$5/month at current cadence (daily + weekly + monthly + event-driven)
- Output: Telegram ADHD-friendly digest — severity 🟢🟡🔴 + action verb first + time-boxed + bundled 2× per day
- Auto-fix policy (weekly/monthly): Grok propose → safety heuristics filter → Grok 4.3 judge verify → git snapshot → apply (review-and-merge gate retained)
Sources audited
- Memory files (
~/.claude/projects/.../memory/*.md) - Workspace
CLAUDE.md(4 files: global + 3 per-workspace) - Project NOTES / README / PRD (~50 files across side projects)
- Email / Slack / meeting transcripts (LL work)
- Confluence dump
- KB notes (curated research)
Stack
Python 3.11 · launchd (4 calendar-interval agents + 1 file-watcher) · Postgres 16 + pgvector (re-uses Personal-RAG retrieval) · bge-m3 embedder · Grok 4.3 (xAI) production judge · Anthropic Haiku 4.5 cheap-tier verifier · Telegram Bot API digest delivery · git snapshot before auto-fix apply
Documentation
| Doc | Read this for |
|---|---|
| PRD | Problem, scope, success metrics, milestones, build vs buy |
| Architecture | 3-layer + 4-tier diagrams, data flows, auto-fix pipeline |
| Implementation | Code structure, prompts, judge prompt, perf numbers, reproducibility |
| Notes | Decision log + production swap to Grok 4.3 + gotchas |
| Enterprise | 5 enterprise adaptations (B2B SaaS, fintech, edtech, healthcare, CX) |
Why this matters
Persistent AI memory is becoming the default — Claude Projects, ChatGPT Custom GPTs, Cursor .cursorrules, Continue, every IDE. Software engineering solved drift with linters + CI + observability. AI memory has none of that. Without an audit layer, you act on stale facts for months before the failure surfaces.
Drift mechanism (the propagation chain)
memory file (stale)
↓
project NOTES quotes memory
↓
AI reads NOTES + memory, treats as ground truth
↓
AI confidently asserts in chat
↓
human acts on the assertion
↓
production code references infrastructure that does not exist
Catching drift at the top of the chain (memory file) costs $0.001 per audit. Catching it at the bottom (production debug) costs hours.
Foundation pattern
Every persistent AI memory needs an audit layer. Knowledge-Audit is the reference implementation for Personal-RAG; the same pattern is what enterprises will need as B2B AI products start shipping persistent memory.