Using Claude Code on Large Codebases: Why It Struggles and How to Fix It

Using Claude Code on Large Codebases: Why It Struggles and How to Fix It
Claude Code is impressive on small projects. Give it a 5,000-line Express app, and it navigates confidently — finding bugs, implementing features, refactoring with precision. The experience is genuinely transformative for solo developers and small teams.
Then you point it at a 200,000-line monorepo and everything degrades. Responses slow down. It references files that don't exist. It makes changes in the wrong module. It spends $4 of tokens just figuring out where your authentication logic lives. The tool that felt like a 10x multiplier on your side project feels like a confused intern on your production codebase.
This scaling problem isn't a bug in Claude Code. It's a fundamental limitation of how LLM-based agents interact with code — and it's fixable.
The Scaling Problem: Where It Breaks Down
Claude Code works by reading your code, building a mental model, and then making changes based on that model. On small codebases, this loop is fast and accurate. The agent can read the entire relevant portion of the project, understand it fully, and act with confidence.
On large codebases, this loop breaks at every step.
The context window fills with irrelevant files. Claude Code has a finite context window — roughly 200K tokens for Sonnet 4. A large codebase can have millions of tokens of source code. The agent can only see a fraction of the codebase at any time, and it has no principled way to decide which fraction to load. So it reads files heuristically: starting from the file you mentioned, following imports, reading nearby files. This works when "nearby" means 5 files. It fails when "nearby" could mean 50.
Exploration takes exponentially more tokens. On a 5K-line project, finding the authentication module takes 2-3 file reads. On a 200K-line monorepo, it might take 15-30 file reads — checking `src/auth`, `packages/auth`, `libs/authentication`, `services/identity`, `modules/user/auth`, and various other locations before finding the right one. Each wrong guess costs 500-2,000 tokens. A single exploration sequence can burn $0.50-1.50 before any productive work begins.
Hallucination probability increases with codebase size. When Claude Code can't find something, it sometimes fabricates it. On a small codebase, fabrication is rare because the agent has seen most of the relevant code. On a large codebase, the agent has seen maybe 5-10% of the code, and the probability of confidently referencing a function that doesn't exist — or exists differently than assumed — increases sharply.
The relationship between codebase size and agent performance isn't linear. It's closer to a cliff: Claude Code works well up to roughly 50K-80K lines, degrades noticeably between 80K-150K lines, and struggles significantly above 150K lines.
Symptoms You'll Recognize
If you're using Claude Code on a large codebase, you've probably experienced some or all of these:
Slow responses. Not network slow — cognitively slow. The agent takes 30-60 seconds to respond because it's reading file after file, building context incrementally. On a small project, the same task would take 5-10 seconds.
Wrong file references. "I've updated the authentication handler in `src/services/auth/handler.ts`" — but that file doesn't exist. Your auth handler is in `packages/core/auth/handlers/login.ts`. The agent hallucinated a plausible path based on patterns it's seen in other codebases.
Incomplete changes. Claude Code modifies 3 of the 7 files that need changes because it didn't discover the other 4. It updates the function signature in the source file but misses the callers in other modules. It changes the API route but forgets the corresponding client-side call.
Repeated exploration. You ask Claude Code to fix a bug. It reads 15 files to understand the context. You ask a follow-up question. It reads 12 of the same files again because the first exploration has scrolled out of the context window. You're paying double for the same understanding.
Confidence without correctness. The agent declares "I've made the necessary changes" with full confidence, but the changes break because they were based on an incomplete understanding of the dependency chain. This is particularly dangerous because the breakage might not surface until runtime.
The Root Cause: No Structural Understanding
All these symptoms trace to a single root cause: Claude Code has no structural understanding of your codebase.
It doesn't know that `UserService` depends on `DatabaseClient` which depends on `ConnectionPool` which reads from `config.database`. It doesn't know that changing the `authenticate()` function signature requires updating 14 callers across 8 files. It doesn't know that `packages/api` and `packages/web` share types from `packages/shared` and that changing a shared type affects both.
Without this structural map, every task becomes an exploration problem. The agent has to reconstruct the dependency graph from scratch, every session, by reading files and following import statements. On a large codebase, this reconstruction is:
- Expensive — 40-60% of total session tokens go to exploration
- Incomplete — the agent rarely discovers all relevant relationships
- Ephemeral — the understanding evaporates when the session ends or the context window shifts
This is the fundamental problem. The agent is trying to navigate a city without a map, reading every street sign it passes and hoping to build a mental model that covers the relevant area. On a small town (small codebase), this works. On a metropolis (large codebase), it's hopelessly inefficient.
The Fix: Dependency Graphs Give Structural Awareness
The solution is to give Claude Code the map it's missing. A dependency graph indexes every symbol in your codebase — functions, classes, types, modules — and maps the relationships between them: calls, imports, type references, inheritance chains.
When Claude Code needs to work on the authentication module, instead of exploring the codebase file by file, it queries the graph: "What files and symbols are related to authentication?" The graph returns a ranked list of relevant code, including:
- The authentication functions themselves
- All callers of those functions
- All dependencies those functions use
- Types and interfaces involved
- Files that historically change together (change coupling)
This transforms the task from open-ended exploration to targeted retrieval. The agent receives exactly the code it needs in its context window, with no wasted reads and no missed dependencies.
The impact on large codebases is dramatic:
- Exploration overhead drops by 80-90%. The agent doesn't need to read 20 files to find the right 5 — the graph serves them directly.
- Hallucination drops significantly. The agent is working from real, verified code relationships, not inferred patterns.
- Multi-file changes become reliable. The graph shows all affected files, not just the ones the agent stumbles across.
- Session costs drop by 58% on average — and the reduction is larger on bigger codebases because the exploration waste was larger.
Practical Setup: Getting vexp Running on a Large Codebase
Setting up vexp on a large codebase involves three steps: installation, indexing, and MCP configuration.
Step 1: Install and Initialize
```bash
npm install -g vexp-cli
cd /path/to/your/large-project
vexp init
```
The `init` command creates a `.vexp/manifest.json` in your project root. This manifest tracks file hashes and is designed to be committed to git — it's lightweight (just blake3 hashes) and lets team members share the index structure.
Step 2: Indexing
Indexing runs automatically after initialization. On a large codebase, timing depends on size:
- 50K lines: ~10 seconds
- 200K lines: ~30-45 seconds
- 500K lines: ~60-90 seconds
The index is stored locally in `.vexp/index.db` (gitignored) and maps every symbol, import, call, and type reference in your codebase. For a 200K-line TypeScript monorepo, the index typically identifies 15,000-40,000 symbols and 50,000-150,000 relationships between them.
Incremental re-indexing happens automatically when files change. Only modified files are re-parsed, making subsequent indexes nearly instant.
Step 3: MCP Configuration
Add vexp to your Claude Code MCP settings (`.claude/settings.json` or your global settings):
```json
{
"mcpServers": {
"vexp": {
"command": "vexp-core",
"args": ["mcp", "--workspace", "."]
}
}
}
```
From this point, Claude Code automatically uses vexp for context retrieval. When you ask "fix the authentication bug," the agent queries vexp for authentication-related symbols instead of exploring the filesystem.
Step 4: Verify
Run a quick test to confirm the setup:
```
> Use index_status to check vexp indexing
```
Claude Code should report the number of indexed files, symbols, and the index health status. If the counts match your codebase size, the setup is complete.
Results on Real Large Codebases
Benchmark data from production codebases shows consistent improvements at scale:
Token reduction by codebase size:
- 10K-50K lines: 45-55% reduction (exploration overhead was moderate, so the reduction is moderate)
- 50K-150K lines: 55-65% reduction (exploration overhead was significant, graph eliminates it)
- 150K-500K lines: 60-70% reduction (exploration overhead dominated sessions, graph eliminates nearly all of it)
Task completion accuracy:
- Multi-file changes on 200K+ line codebases go from ~60% correct (agent misses affected files) to ~90% correct (graph identifies all affected files)
- File reference accuracy goes from ~75% (agent guesses plausible paths) to ~98% (graph provides verified paths)
Session cost on a 200K-line monorepo:
- Without vexp: typical task costs $2-5 in tokens (heavy exploration)
- With vexp: same task costs $0.80-2.00 (targeted retrieval)
- Monthly savings for a developer running 8-10 tasks/day: $150-300
Response latency:
- Without structural context: 20-45 seconds for initial response (reading files)
- With structural context: 8-15 seconds (context pre-served, minimal file reads)
The improvement is most visible on the tasks that were most painful without it: cross-module refactors, dependency chain debugging, and feature implementations that touch multiple packages. These tasks shift from "Claude Code needs heavy hand-holding" to "Claude Code handles it autonomously."
Making It Work: Best Practices for Large Codebases
Beyond the basic setup, a few practices maximize Claude Code's effectiveness on large codebases:
Use CLAUDE.md for architectural context. Give the agent a 3-5 sentence overview of your project structure — what the major modules are, where they live, how they relate. This complements the dependency graph with human-level architectural intent.
Scope your prompts. Instead of "fix the bug," say "fix the session timeout bug in packages/auth." The more specific the starting point, the faster the graph retrieval and the less exploration needed.
Commit your manifest. The `.vexp/manifest.json` file should be in git. Team members who clone the repo can run `vexp init` and get an index based on the shared manifest, without re-analyzing the full codebase from scratch.
Keep sessions focused. Even with a dependency graph, large codebase sessions benefit from the one-task-per-session discipline. Cross-module context from one task rarely helps with the next.
The core insight is simple: Claude Code's capabilities don't degrade on large codebases — its ability to find and load the right context does. Give it structural awareness through a dependency graph, and the 200K-line monorepo experience starts to feel like the 5K-line project experience. The agent knows where things are, understands how they connect, and makes changes with confidence backed by verified relationships rather than guesses.
Frequently Asked Questions
Why does Claude Code struggle with large codebases?
At what codebase size does Claude Code start degrading?
How does vexp improve Claude Code on large codebases?
How long does vexp take to index a large codebase?
Does Claude Code work on monorepos?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide
Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

Windsurf Credits Running Out? How to Use Fewer Tokens Per Task
Windsurf credits deplete fast because the AI processes too much irrelevant context. Reduce what it needs to read and your credits last 2-3x longer.

Best AI Coding Tool for Startups: Balancing Cost, Speed, and Quality
Startups need speed and budget control. The ideal AI coding stack combines a free/cheap agent with context optimization — here's how to set it up.