Knowledge-Audit — Cross-Source Contradiction Detector

A cross-source audit layer that sits on top of Personal-RAG and continuously verifies that the corpus does not silently contradict itself. Built after the AI confidently asserted “we run on Oracle Cloud A1 VM” — infrastructure that was never registered.

At a glance

3-layer audit: (1) intra-file consistency, (2) cross-file contradiction within a workspace, (3) cross-workspace contradiction
4-tier cron: daily (light, 4B + Haiku 4.5 verifier) · weekly (8B + Haiku 4.5) · monthly (32B alone, deeper) · event-triggered (post-commit hook on the KB-s3 mount)
Runs locally on MacBook Pro M2 Max via launchd — not in the cloud
Caught 25+ stale facts in a 79-file corpus on day-one deploy
Production swap 2026-05-23: Haiku verifier → Grok 4.3. Real accuracy 80% (strict-match scorer) → 99% (LLM-judged real) after an Eval-Framework bake-off
Cost: ~$5/month at current cadence (daily + weekly + monthly + event-driven)
Output: Telegram ADHD-friendly digest — severity 🟢🟡🔴 + action verb first + time-boxed + bundled 2× per day
Auto-fix policy (weekly/monthly): Grok propose → safety heuristics filter → Grok 4.3 judge verify → git snapshot → apply (review-and-merge gate retained)

Sources audited

Memory files (~/.claude/projects/.../memory/*.md)
Workspace CLAUDE.md (4 files: global + 3 per-workspace)
Project NOTES / README / PRD (~50 files across side projects)
Email / Slack / meeting transcripts (LL work)
Confluence dump
KB notes (curated research)

Stack

Python 3.11 · launchd (4 calendar-interval agents + 1 file-watcher) · Postgres 16 + pgvector (re-uses Personal-RAG retrieval) · bge-m3 embedder · Grok 4.3 (xAI) production judge · Anthropic Haiku 4.5 cheap-tier verifier · Telegram Bot API digest delivery · git snapshot before auto-fix apply

Documentation

Doc	Read this for
PRD	Problem, scope, success metrics, milestones, build vs buy
Architecture	3-layer + 4-tier diagrams, data flows, auto-fix pipeline
Implementation	Code structure, prompts, judge prompt, perf numbers, reproducibility
Notes	Decision log + production swap to Grok 4.3 + gotchas
Enterprise	5 enterprise adaptations (B2B SaaS, fintech, edtech, healthcare, CX)

Why this matters

Persistent AI memory is becoming the default — Claude Projects, ChatGPT Custom GPTs, Cursor .cursorrules, Continue, every IDE. Software engineering solved drift with linters + CI + observability. AI memory has none of that. Without an audit layer, you act on stale facts for months before the failure surfaces.

Drift mechanism (the propagation chain)

memory file (stale)
      ↓
project NOTES quotes memory
      ↓
AI reads NOTES + memory, treats as ground truth
      ↓
AI confidently asserts in chat
      ↓
human acts on the assertion
      ↓
production code references infrastructure that does not exist

Catching drift at the top of the chain (memory file) costs $0.001 per audit. Catching it at the bottom (production debug) costs hours.

Foundation pattern

Every persistent AI memory needs an audit layer. Knowledge-Audit is the reference implementation for Personal-RAG; the same pattern is what enterprises will need as B2B AI products start shipping persistent memory.