Karpathy's LLM Wiki, for code

Why your AI coding agent keeps duplicating functions — and a markdown-only fix.

Every coding session, the agent starts over.

It reads your CLAUDE.md or AGENTS.md. It sees the file tree. But neither tells it what already exists and where. So when you ask it to add a date formatter that handles ISO timestamps with timezone offsets, it does the rational thing: it assumes the function doesn't exist, and writes a new one. Last week's session already wrote that function. The week before, a slightly different one. None of them know about the others.

This is the fresh-start problem, and it's why large vibe-coded repos degrade. You end up with three slightly different auth helpers. Two date formatters that disagree on edge cases. A "new" component that's eighty percent an old one. Each session leaves the codebase a little harder for the next session to understand.

The instinct is to throw RAG at it. Embed the repo, retrieve relevant chunks at query time. This works, but it's overkill for what's mostly a static problem. Code RAG is also finicky in ways that bite you: chunking semantics is hard, embeddings miss the structural relationships that matter most (this function is called by these three other functions; this module owns this concept), and you pay infrastructure cost on every session for a problem the file system could mostly solve.

There's a lower-tech win first.

Karpathy already wrote the spec

In April 2026, Andrej Karpathy posted a gist describing what he calls an LLM Wiki. The pattern is straightforward:

Raw sources — your immutable collection of documents. Articles, papers, notes. The LLM reads from this layer but never writes to it.
The wiki — an LLM-generated tree of markdown files. Summaries, entity pages, concept pages. The LLM owns this layer entirely — creating pages, updating cross-references, keeping things consistent.
The schema — a CLAUDE.md (or AGENTS.md) that tells the LLM how the wiki is structured, what the conventions are, and what to do when ingesting new sources, answering questions, or maintaining the wiki.

Two special files at the wiki root: index.md is content-oriented — a catalog of every page with a one-line summary. log.md is chronological — an append-only record of every change.

Karpathy pitched the pattern for personal knowledge bases: research notes, book companions, business team wikis. The metaphor he uses in passing — "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase" — points at exactly what I want to talk about. We're taking the metaphor literally.

Apply it to code

The adaptation is small. Three rules:

1. One INDEX.md per directory, not just at the repo root.

A single root-level index works for a personal wiki because the page set is flat-ish — entity pages, concept pages, a few categories. A codebase is hierarchical. You want to be able to ask the agent "what's in backend/services?" and have it read one ~50-line file, not parse a 5,000-line monorepo catalog. So the index lives where the code does: each directory gets its own INDEX.md covering only its immediate children.

2. One log.md at the repo root.

Same as Karpathy's pattern. Append-only. Each session ends with a short entry: date, task summary, files changed, key decisions or gotchas. After a few weeks this becomes the most useful thing in the repo — better than git log because it captures intent.

3. Two rules in CLAUDE.md (or AGENTS.md)

Starting a task: Read the root INDEX.md to orient. Then read the relevant subdirectory INDEX.md files to find specific files. Read source files only after identifying them via indexes — avoid blind exploration.

*> Finishing a task: If you added, renamed, or deleted files, update the relevant INDEX.md. Append an entry to log.md with date, task summary, files changed, and any key decisions or gotchas.

That's it. The pattern is markdown plus discipline.

What an index entry actually looks like

Each entry answers three questions:

What's this module's job?
What does it export to the rest of the repo?
What's the don't-touch-without-reading-X gotcha?

A real line from one of my own backend/INDEX.md files:

config/models.ts — maps tiers (fast/balanced/advanced) to model names.
Never hardcode model strings elsewhere; the DB stores the tier, not "claude-sonnet-4-6".

That single line saves an LLM from a specific antipattern it commits constantly on autopilot. Ask any coding agent to "use Claude Sonnet" and it will cheerfully write "claude-sonnet-4-6" directly into your schema migration. The index entry preempts that — when the agent reads the index before starting, it sees the abstraction layer and the reason for it. It uses the tier, not the model string.

Repeat that across a few dozen index entries — this is a singleton, don't instantiate, use the proxy route, never raw R2 URLs, this util already handles ISO timestamps — and the agent stops making the same five mistakes every session.

Before / after

Before the index pattern:

Me: Write me a function that formats an ISO timestamp into "5 minutes ago" in the user's local timezone. **Agent: writes a new function in a new file, importing date-fns, which is already in the project for exactly this purpose

After:

Agent: reads root INDEX.md → reads utils/INDEX.md → sees dates.ts — ISO + timezone-safe relative-time formatting. Use this for anything time-related; don't reinvent. → imports the existing helper, writes three lines.

Across one session, the agent reads maybe five to ten small files instead of grepping blind or stuffing the whole repo into context. Across many sessions, the codebase becomes more navigable, not less. Karpathy's "compounding artifact" applied to code.

Why this is cheap

No vector DB. No embedding pipeline. No retrieval tuning.

Just markdown files committed to the repo. Regenerated when the code in a directory changes — a pre-commit hook, a nightly job, or a manual script run at the end of a working session. The INDEX.md is the artifact. Git tracks it like any other file. You get version history, blame, and PR-time review of how the index evolves alongside the code it describes.

When the agent reads an index file, it costs you maybe two hundred tokens. When it reads ten of them across a session, that's two thousand tokens — less than a single tool call. The economics are unrecognizable from RAG.

How to set this up

Bootstrapping the indexes is the part nobody wants to do by hand — but you don't need a separate tool to do it. Paste the code-wiki prompt into whichever AI coding tool you already use (Cursor, Claude Code, Codex CLI, OpenCode, whatever) and the agent walks your repo and generates the INDEX.md files itself. The output is better than what a script would produce, because the agent has full context for your stack, your conventions, and your existing CLAUDE.md.

This is deliberately not a CLI. Karpathy made the case in his original LLM Wiki gist and the tweet that introduced it: in the era of capable LLM agents, the right artifact to share is the idea itself — your agent customizes it for your repo. A separate Python tool is friction you don't need.

One paste, a few minutes of spot-check editing, and the pattern is live in your repo.

What this is and isn't

This is not a replacement for embeddings on huge codebases at scale. If you're navigating a million-line monorepo across hundreds of services, you probably do want a real search layer. For repo sizes most of us actually work on — say, ten to two hundred thousand lines — markdown indexes are enough.

This is not a magic bullet. The pattern only works if the agent actually reads the indexes, which is what the CLAUDE.md rules are for. The rules need to be loaded into every session; if your agent ignores them, the indexes go stale and the whole thing collapses back into the fresh-start problem.

And the pattern itself isn't novel. Karpathy wrote the spec. The contribution here is the per-directory adaptation and the two rules that make it work for active coding rather than reading

If you've been losing afternoons to "the agent rewrote the thing I already wrote last week," this is the cheapest fix I've found. Try it on one subdirectory first. The compounding starts immediately.

if you're building with AI coding tools — Cursor, Lovable, Bolt, Codex CLI, whatever the next one is called — and you're hitting the wall where your shipped-in-an-afternoon project starts breaking in production, my Vibe Coder Rescue agent is at proagentme.com. Twenty-four-hour AI; one-click escalation to me when AI isn't enough.