Using Claude Code on Large Codebases: Why It Struggles and How to Fix It

Nicola·April 13, 2026

Using Claude Code on Large Codebases: Why It Struggles and How to Fix It

Claude Code is impressive on small projects. Give it a 5,000-line Express app, and it navigates confidently — finding bugs, implementing features, refactoring with precision. The experience is genuinely transformative for solo developers and small teams.

Then you point it at a 200,000-line monorepo and everything degrades. Responses slow down. It references files that don't exist. It makes changes in the wrong module. It spends $4 of tokens just figuring out where your authentication logic lives. The tool that felt like a 10x multiplier on your side project feels like a confused intern on your production codebase.

This scaling problem isn't a bug in Claude Code. It's a fundamental limitation of how LLM-based agents interact with code — and it's fixable.

The Scaling Problem: Where It Breaks Down

Claude Code works by reading your code, building a mental model, and then making changes based on that model. On small codebases, this loop is fast and accurate. The agent can read the entire relevant portion of the project, understand it fully, and act with confidence.

On large codebases, this loop breaks at every step.

The context window fills with irrelevant files. Claude Code has a finite context window — roughly 200K tokens for Sonnet 4. A large codebase can have millions of tokens of source code. The agent can only see a fraction of the codebase at any time, and it has no principled way to decide which fraction to load. So it reads files heuristically: starting from the file you mentioned, following imports, reading nearby files. This works when "nearby" means 5 files. It fails when "nearby" could mean 50.

Exploration takes exponentially more tokens. On a 5K-line project, finding the authentication module takes 2-3 file reads. On a 200K-line monorepo, it might take 15-30 file reads — checking `src/auth`, `packages/auth`, `libs/authentication`, `services/identity`, `modules/user/auth`, and various other locations before finding the right one. Each wrong guess costs 500-2,000 tokens. A single exploration sequence can burn $0.50-1.50 before any productive work begins.

Hallucination probability increases with codebase size. When Claude Code can't find something, it sometimes fabricates it. On a small codebase, fabrication is rare because the agent has seen most of the relevant code. On a large codebase, the agent has seen maybe 5-10% of the code, and the probability of confidently referencing a function that doesn't exist — or exists differently than assumed — increases sharply.

The relationship between codebase size and agent performance isn't linear. It's closer to a cliff: Claude Code works well up to roughly 50K-80K lines, degrades noticeably between 80K-150K lines, and struggles significantly above 150K lines.

Symptoms You'll Recognize

If you're using Claude Code on a large codebase, you've probably experienced some or all of these:

Slow responses. Not network slow — cognitively slow. The agent takes 30-60 seconds to respond because it's reading file after file, building context incrementally. On a small project, the same task would take 5-10 seconds.

Wrong file references. "I've updated the authentication handler in `src/services/auth/handler.ts`" — but that file doesn't exist. Your auth handler is in `packages/core/auth/handlers/login.ts`. The agent hallucinated a plausible path based on patterns it's seen in other codebases.

Incomplete changes. Claude Code modifies 3 of the 7 files that need changes because it didn't discover the other 4. It updates the function signature in the source file but misses the callers in other modules. It changes the API route but forgets the corresponding client-side call.

Repeated exploration. You ask Claude Code to fix a bug. It reads 15 files to understand the context. You ask a follow-up question. It reads 12 of the same files again because the first exploration has scrolled out of the context window. You're paying double for the same understanding.

Confidence without correctness. The agent declares "I've made the necessary changes" with full confidence, but the changes break because they were based on an incomplete understanding of the dependency chain. This is particularly dangerous because the breakage might not surface until runtime.

The Root Cause: No Structural Understanding

All these symptoms trace to a single root cause: Claude Code has no structural understanding of your codebase.

It doesn't know that `UserService` depends on `DatabaseClient` which depends on `ConnectionPool` which reads from `config.database`. It doesn't know that changing the `authenticate()` function signature requires updating 14 callers across 8 files. It doesn't know that `packages/api` and `packages/web` share types from `packages/shared` and that changing a shared type affects both.

Without this structural map, every task becomes an exploration problem. The agent has to reconstruct the dependency graph from scratch, every session, by reading files and following import statements. On a large codebase, this reconstruction is:

Expensive — 40-60% of total session tokens go to exploration
Incomplete — the agent rarely discovers all relevant relationships
Ephemeral — the understanding evaporates when the session ends or the context window shifts

This is the fundamental problem. The agent is trying to navigate a city without a map, reading every street sign it passes and hoping to build a mental model that covers the relevant area. On a small town (small codebase), this works. On a metropolis (large codebase), it's hopelessly inefficient.

The Fix: Dependency Graphs Give Structural Awareness

The solution is to give Claude Code the map it's missing. A dependency graph indexes every symbol in your codebase — functions, classes, types, modules — and maps the relationships between them: calls, imports, type references, inheritance chains.

When Claude Code needs to work on the authentication module, instead of exploring the codebase file by file, it queries the graph: "What files and symbols are related to authentication?" The graph returns a ranked list of relevant code, including:

The authentication functions themselves
All callers of those functions
All dependencies those functions use
Types and interfaces involved
Files that historically change together (change coupling)

This transforms the task from open-ended exploration to targeted retrieval. The agent receives exactly the code it needs in its context window, with no wasted reads and no missed dependencies.

The impact on large codebases is dramatic:

Exploration overhead drops by 80-90%. The agent doesn't need to read 20 files to find the right 5 — the graph serves them directly.
Hallucination drops significantly. The agent is working from real, verified code relationships, not inferred patterns.
Multi-file changes become reliable. The graph shows all affected files, not just the ones the agent stumbles across.
Session costs drop by 58% on average — and the reduction is larger on bigger codebases because the exploration waste was larger.

Practical Setup: Getting vexp Running on a Large Codebase

Setting up vexp on a large codebase involves three steps: installation, indexing, and MCP configuration.

Step 1: Install and Initialize

```bash

npm install -g vexp-cli

cd /path/to/your/large-project

vexp init

```

The `init` command creates a `.vexp/manifest.json` in your project root. This manifest tracks file hashes and is designed to be committed to git — it's lightweight (just blake3 hashes) and lets team members share the index structure.

Step 2: Indexing

Indexing runs automatically after initialization. On a large codebase, timing depends on size:

50K lines: ~10 seconds
200K lines: ~30-45 seconds
500K lines: ~60-90 seconds

The index is stored locally in `.vexp/index.db` (gitignored) and maps every symbol, import, call, and type reference in your codebase. For a 200K-line TypeScript monorepo, the index typically identifies 15,000-40,000 symbols and 50,000-150,000 relationships between them.

Incremental re-indexing happens automatically when files change. Only modified files are re-parsed, making subsequent indexes nearly instant.

Step 3: MCP Configuration

Add vexp to your Claude Code MCP settings (`.claude/settings.json` or your global settings):

```json

{

"mcpServers": {

"vexp": {

"command": "vexp-core",

"args": ["mcp", "--workspace", "."]

}

```

From this point, Claude Code automatically uses vexp for context retrieval. When you ask "fix the authentication bug," the agent queries vexp for authentication-related symbols instead of exploring the filesystem.

Step 4: Verify

Run a quick test to confirm the setup:

```

> Use index_status to check vexp indexing

```

Claude Code should report the number of indexed files, symbols, and the index health status. If the counts match your codebase size, the setup is complete.

Results on Real Large Codebases

Benchmark data from production codebases shows consistent improvements at scale:

Token reduction by codebase size:

10K-50K lines: 45-55% reduction (exploration overhead was moderate, so the reduction is moderate)
50K-150K lines: 55-65% reduction (exploration overhead was significant, graph eliminates it)
150K-500K lines: 60-70% reduction (exploration overhead dominated sessions, graph eliminates nearly all of it)

Task completion accuracy:

Multi-file changes on 200K+ line codebases go from ~60% correct (agent misses affected files) to ~90% correct (graph identifies all affected files)
File reference accuracy goes from ~75% (agent guesses plausible paths) to ~98% (graph provides verified paths)

Session cost on a 200K-line monorepo:

Without vexp: typical task costs $2-5 in tokens (heavy exploration)
With vexp: same task costs $0.80-2.00 (targeted retrieval)
Monthly savings for a developer running 8-10 tasks/day: $150-300

Response latency:

Without structural context: 20-45 seconds for initial response (reading files)
With structural context: 8-15 seconds (context pre-served, minimal file reads)

The improvement is most visible on the tasks that were most painful without it: cross-module refactors, dependency chain debugging, and feature implementations that touch multiple packages. These tasks shift from "Claude Code needs heavy hand-holding" to "Claude Code handles it autonomously."

Making It Work: Best Practices for Large Codebases

Beyond the basic setup, a few practices maximize Claude Code's effectiveness on large codebases:

Use CLAUDE.md for architectural context. Give the agent a 3-5 sentence overview of your project structure — what the major modules are, where they live, how they relate. This complements the dependency graph with human-level architectural intent.

Scope your prompts. Instead of "fix the bug," say "fix the session timeout bug in packages/auth." The more specific the starting point, the faster the graph retrieval and the less exploration needed.

Commit your manifest. The `.vexp/manifest.json` file should be in git. Team members who clone the repo can run `vexp init` and get an index based on the shared manifest, without re-analyzing the full codebase from scratch.

Keep sessions focused. Even with a dependency graph, large codebase sessions benefit from the one-task-per-session discipline. Cross-module context from one task rarely helps with the next.

The core insight is simple: Claude Code's capabilities don't degrade on large codebases — its ability to find and load the right context does. Give it structural awareness through a dependency graph, and the 200K-line monorepo experience starts to feel like the 5K-line project experience. The agent knows where things are, understands how they connect, and makes changes with confidence backed by verified relationships rather than guesses.

Frequently Asked Questions

Why does Claude Code struggle with large codebases?

Claude Code lacks structural understanding of code relationships. On large codebases (100K+ lines), it must explore the filesystem to discover which files are relevant to a task — reading 15-30 files just to find the right 5. This exploration consumes 40-60% of session tokens, increases hallucination risk, and often misses relevant files entirely. The agent is navigating without a map.

At what codebase size does Claude Code start degrading?

Performance degrades noticeably around 80K-150K lines and struggles significantly above 150K lines. The degradation isn't linear — it behaves more like a cliff as the ratio of relevant code to total code decreases. A 200K-line monorepo where only 2-3% of files are relevant to any given task is the worst case for unassisted exploration.

How does vexp improve Claude Code on large codebases?

vexp builds a dependency graph that maps every symbol, import, and call relationship in your codebase. When Claude Code needs context, vexp serves exactly the relevant code — functions, their callers, their dependencies, and historically co-changed files. This eliminates exploration overhead (80-90% reduction), improves multi-file change accuracy (from ~60% to ~90%), and reduces session costs by 58-70% on large codebases.

How long does vexp take to index a large codebase?

Initial indexing takes approximately 10 seconds for 50K lines, 30-45 seconds for 200K lines, and 60-90 seconds for 500K lines. Subsequent re-indexing is nearly instant because only modified files are re-parsed. The index is stored locally and the manifest file (lightweight blake3 hashes) can be committed to git for team sharing.

Does Claude Code work on monorepos?

Claude Code can work on monorepos but struggles without structural assistance because cross-package dependencies are hard to discover through file exploration alone. A dependency graph solves this by mapping relationships across package boundaries — showing that changing a type in packages/shared affects callers in both packages/api and packages/web. With vexp, monorepo performance is comparable to single-package projects.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.