Why your AI memory lies to you — and how I built a 3-layer Knowledge-Audit to catch it

TL;DR — Yesterday the AI told me I was running my personal infrastructure on an Oracle Cloud A1 VM (ARM free tier). Confident, with an architecture diagram and a citation from the workspace CLAUDE.md. The problem: I never registered that VM. It does not exist. I built Knowledge-Audit (private repo) — a 3-layer audit that runs on a weekly cron and caught more than 25 stale facts in my 79-file corpus within a day of deploying it. This post explains why every persistent-memory setup needs an audit layer.

AI memory drift is the silent problem of 2026

Claude Code, ChatGPT custom GPTs, Cursor, Continue — all of them are now shipping “persistent memory”:

Files inside ~/.claude/projects/.../memory/
Workspace CLAUDE.md instructions
Per-project .cursorrules
Hidden vector DBs for session history

The promise: the AI knows your context, your preferences, your projects.

The reality: the AI confidently asserts facts that were true three months ago, but aren’t anymore.

The classic pattern:

Day 1: You write “I’m using Postgres 14 for this side project” into a memory file.
Day 60: You upgrade to Postgres 16. You forget to update the memory file.
Day 120: The AI confidently suggests a Postgres-14-only feature that breaks on 16.

Multiply that by 50+ memory files, 12 side projects, 5+ workspace CLAUDE.md files, infrastructure that shifts weekly. Drift is inevitable without active validation.

Worse — the LLM doesn’t know what it doesn’t know. It treats memory as ground truth.

My specific hallucination

I had written “OCI Foundation (VM A1 + ADB 23ai + Object Storage + Block Volume + Email Delivery + Functions). Set up once, shared by 12 projects. Prerequisite for personal-rag-kb (P0)” in a workspace CLAUDE.md — as a plan from February.

Six months later, the AI assistant read that line and treated it as deployed infrastructure. A memory file mail_watcher_project.md mentioned “couldn’t register A1” but the AI never cross-referenced.

When I asked it to “deploy a new cron job”, the AI replied: “Deploying to the Oracle A1 cron-host VM, using ADB 23ai for persistence.” I believed it. I started writing code against fake infrastructure.

When the audit tool I had just built flagged the inconsistency: conflict confidence 95. Workspace said A1 was live; memory said it was never registered. The AI had been compounding a hallucination for six months.

Software engineering solved this problem a long time ago

For code, we have:

Linters that catch broken references at build time
Type checkers that flag inconsistent types
CI/CD that runs tests on every commit
Observability that alerts when prod drifts from spec

For AI memory? Nothing.

You write notes → the AI reads them → the AI tells you facts → you act. No verification step. No “this fact is 90 days old and the server it references no longer responds”. No “wait, this fact contradicts another file”.

I built Knowledge-Audit to fill that gap.

Three layers, escalating cost

The core insight: different stale-fact patterns need different detection methods.

Layer 1 — Static check (free)
   ↓
Layer 2 — LLM cross-source ($0.20/run)
   ↓
Layer 3 — Live probe (free)

Layer 1: Static checks

Free, fast, catches the obvious:

A cited path that doesn’t exist (os.path.exists)
Invalid IP format (octets > 255)
Port out of range
Malformed URL

Catches: “I cite /Users/old/path/foo.py in five places, but I moved it to /Users/new/path/.”

Limit: syntax errors only, not semantic ones.

Layer 2: LLM cross-source contradiction

The most expensive layer, but the most powerful. Bundle ALL audit files into one Sonnet 4.6 prompt:

## CATEGORY: memory
### file_a.md ...
### file_b.md ...

## CATEGORY: workspace_claude_md
### CLAUDE.md ...

## CATEGORY: project_notes
### NOTES.md ...

Find facts that contradict each other or look stale.
Output JSON with confidence + evidence chain.

The LLM reads everything at once and spots:

File A says “X is true”, file B says “X is false” → cross_source_mismatch
File C says “current” but the date is 6 months old → stale_fact
File D claims something exists that file E describes as “planned” → broken_assumption

This is the layer that caught the Oracle A1 hallucination.

Cost: ~$0.20 per weekly run. Worth every cent.

Layer 3: Live probe

Free, executes shell commands to verify claims:

{
  "claim_type": "port",
  "claim_value": "8080",
  "auto_probe": "lsof -ti :8080 -sTCP:LISTEN | head -1"
}

Memory says “rag-kb daemon listens on port 8080” but lsof doesn’t see it → finding.

Probe templates per claim type:

path → test -e
port → lsof -ti
url → curl -sI -m 5 -o /dev/null -w '%{http_code}'
version → product-specific (python3 --version, gcloud --version, …)
hostname → host

Plus a manual registry (probes.json) with 23 entries for project-specific facts (GCP machine type, vault decrypt, daemon health, …).

The brittleness trap I almost fell into

The first version had a clever idea: hash-based suppression. When the LLM flagged something that was actually fine, store the finding fingerprint in suppressions.json with an expiry. Future runs skip it.

I added 14 suppressions for legitimate-but-flagged-anyway findings.

Next run: 7 NEW findings, all logically equivalent to suppressed ones, but with different fingerprints. Why? The LLM rephrases findings on every run — slightly different wording, different example quotes, different fingerprint hash.

I was playing whack-a-mole.

The fix: add explanatory context to the source file.

Instead of suppressing “marcng path looks wrong” by hash, I added a comment:

<!-- NOTE: `-Users-marcng-Documents-Personal-Assistant` is the Claude Code
project folder encoding for legacy sessions when the user had username `marcng`.
The current system uses `marcmax` but Claude Code preserves the old folder name.
The folder REALLY EXISTS at /Users/marcmax/.claude/... — intentional legacy
preservation. -->

Now the LLM reads the file, sees the explanation, stops flagging. Durable, no dependence on hashes.

The deeper lesson: bad suppression hides the problem; good context teaches the auditor.

Catch-up when the Mac is off

I wanted the daily run at 3 AM. macOS cron doesn’t catch up missed runs if the Mac was off at that time. macOS launchd does — with RunAtLoad: true plus a state file to dedupe.

<key>StartCalendarInterval</key>
<dict>
    <key>Hour</key><integer>3</integer>
    <key>Minute</key><integer>0</integer>
</dict>
<key>RunAtLoad</key><true/>

Plus state-file logic in audit.py:

# Skip-if-recent guard
if last_run < 23 hours ago:
    print(f"SKIP: tier={tier} ran {hours_since:.1f}h ago")
    sys.exit(0)

Mac off all day → boots at 9 AM → RunAtLoad fires the audit → state file shows last run 2 days ago → audit runs. Mac wakes from sleep in the afternoon after a successful 3 AM run → RunAtLoad fires → state file shows 6h ago → skip.

Three catch-up layers, defense-in-depth.

PII redaction is non-negotiable

Layer 2 sends file content to the Anthropic API. Even with zero-retention, defense-in-depth says: redact secrets first.

REDACTIONS = [
    (re.compile(r"sk-ant-[a-zA-Z0-9_\-]{20,}"), "[REDACTED:anthropic_key]"),
    (re.compile(r"\bsk-(?!ant-)[A-Za-z0-9]{20,}"), "[REDACTED:openai_key]"),
    (re.compile(r"ghp_[A-Za-z0-9]{36,}"), "[REDACTED:github_pat]"),
    (re.compile(r"AKIA[0-9A-Z]{16}"), "[REDACTED:aws_access_key]"),
    (re.compile(r"-----BEGIN[A-Z ]+PRIVATE KEY-----[\s\S]+?-----END[A-Z ]+PRIVATE KEY-----"),
     "[REDACTED:private_key]"),
    # ... 10+ patterns
]

For credentials_vault.md and other high-risk files, also partial-redact emails ([email protected] → [user]@example.com — keep the domain for cross-source diff).

Tested with 6 unit cases. 100% pass.

Production results — 7 audit runs in one day

Run	RED	YELLOW	What changed
1 (initial)	10	15	baseline scan
2 (after 13 patches)	6	17	applied LLM-suggested fixes
3 (worktree fix)	2	21	bulk-fixed 56 worktree CLAUDE.md files
4 (probe fix)	0	26	fixed `lsof -ti` template bug
5 (suppressions)	2	7	new findings surfaced
6 (memory updates)	2	7	duplicate fingerprints
7 (root-cause fixes)	0	10	clean ✅

Total fixes shipped in one day:

16 memory file edits (closing open questions: stay LL through probation, hybrid Claude for Voice-Assistant, defer AI cofounder Q3, M2 cold-start benchmark verified)
56 workspace CLAUDE.md updates
14 fingerprint suppressions
5 root-cause prompt-context fixes
1 probe bug fix
23 manual probes registered

Why this matters RIGHT NOW

Three things are happening at the same time:

Persistent AI memory is becoming the default. Claude Projects, ChatGPT Custom GPTs, Cursor .cursorrules, Continue .continue/, every IDE.
The personal LLM stack is growing. I have 12 side projects, 50+ memory files, 5+ workspace instructions. Multiply by millions of developers.
Nobody audits. People discover drift after acting on stale info (“wait, that VM doesn’t exist anymore?”), not before.

The cost is invisible until it hits production. By then, trust has already eroded.

Build it yourself

Architecture summary (about 30 Python files, source available):

3 verification layers (static / LLM / live probe)
4-tier cadence (daily probes / weekly full / monthly exhaustive / event-driven hooks)
launchd agents for catch-up after Mac off/sleep
PII redaction before sending to the LLM
Fingerprint-based suppression with expiry
Source registry covering memory + CLAUDE.md (global + workspace) + project NOTES + settings + KB sample
23 manual probes + auto-probe templates per claim type
Cost: ~$3/month for daily/weekly/monthly + event-driven runs

Total build time: 8 hours, one session. 14 hardening features, full test corpus.

The hardest part wasn’t the code. It was trusting the audit results enough to fix what it found — many of the things I “always knew” were minor inconsistencies I kept ignoring. The audit forces confrontation: either justify (add explanatory context) or fix (update the file).

What’s next

I’m building toward:

Reverse audit — scan the codebase for facts that should be in memory but aren’t (proactive bootstrap).
Drift detection — trend the RED count over time, alert when > moving avg + 2 stddev.
Auto-fix Tier A — high-confidence + reversible findings → apply patches automatically.
Adversarial test corpus — synthetic stale facts with known answers, measure precision/recall on a benchmark.

Bigger picture: I think every personal AI setup needs an audit layer. Some will use mine. Most will build their own. A few teams will productize it. All of them will be glad they have it the first time the AI confidently asserts something stale.

Until AI can no longer lie to you, you’re flying blind.

Built and shipped 2026-05-08 in 8 hours. Source: ~/.claude/hooks/audit-knowledge/. Inspired by the realization that I had quietly trusted AI memory for 6 months without validating once.