Your AI Coding Agent Reads Too Many Files — Here's the Fix

Your AI Coding Agent Reads Too Many Files — Here's the Fix
You ask an AI coding agent a question about a single function. It responds by reading 15 files. By the time it answers, your context window is half-full, you’ve spent real money on tokens, and the final explanation only needed 3 of those files.
This isn’t a bug in Claude Code or any other agent. It’s structural: without a code graph, the agent has no reliable way to know which files are relevant before it reads them. So it reads broadly and hopes.
This post explains why that happens, what it costs, and how graph-based context fixes it.
Why Agents Read Too Many Files
When you give an AI coding agent a task, it needs context about your codebase. It typically has three options:
- Ask you to provide the relevant files manually
- Search the filesystem for potentially relevant files
- Read files based on heuristics (e.g. directories that sound relevant)
Each option has serious drawbacks.
1. Manual file selection
You tell the agent exactly which files matter. This works, but only if you already understand the dependency surface of the problem.
For simple bugs, you often do: you know the function, the module, and maybe one or two helpers. For cross-module issues, refactors, or unfamiliar code, you usually don’t know the full blast radius.
2. Filesystem search
The agent searches by filename or directory structure. That’s a weak proxy for actual dependencies.
A function you care about might live in utils/helpers.ts alongside a dozen unrelated utilities. A filename match doesn’t tell you which symbols are actually involved in the behavior you’re debugging.
3. Heuristic directory reads
This is the most common pattern:
“The bug is in the payment module, so let me read everything in src/payments/.”The agent lists the directory, then reads file after file because it has no better signal than “this folder name sounds relevant.” That’s how you end up with 15-file reads for a question that truly depends on 3–5 files.
The core problem: none of these approaches use the real signal — the dependency structure of your code.
- Which functions call which functions?
- Which types are used where?
- Which modules actually participate in this behavior?
Without a dependency graph, the agent is navigating your codebase blind.
For a deeper dive into what a dependency graph is and how it works for AI coding, see:
What Is a Dependency Graph for AI Coding?
The Cost of Over-Reading
Over-reading isn’t just an aesthetic problem. It has three concrete costs: tokens, context quality, and time.
1. Token cost
Every file read consumes tokens. A typical TypeScript file is ~2,000–4,000 tokens. Reading 15 such files means 30,000–60,000 tokens of raw code in your context.
At Claude Sonnet pricing, that’s roughly $0.09–$0.18 in input tokens per task just for file content, before any conversation history.
If a developer does 20 such tasks per day with over-reading, that’s $1.80–$3.60/day in unnecessary file-reading costs. At team scale, this compounds into thousands of dollars per month.
2. Context quality cost
More context is not always better context. Irrelevant files are noise. They:
- Dilute the signal from the truly relevant code
- Increase the chance the model latches onto the wrong pattern
- Make it harder for the agent to keep the right invariants in working memory
Counterintuitively, less but more relevant context often produces better answers than a huge, noisy context window.
3. Time cost
Each file read is a tool call. Tool calls typically add 500–2000ms of latency.
Reading 15 files can easily add 7–30 seconds before you see any answer at all. For interactive development, that latency is painful and breaks flow.
What Good File Selection Looks Like
For any given question about your code, there’s a minimal set of files that contains everything needed to answer correctly.
- For a typical bug fix, that’s usually 3–5 files
- For a feature addition touching multiple modules, maybe 7–10 files
Good file selection is about discovering that minimal set using the dependency graph, starting from the symbols your question touches.
A good process looks like this:
- Identify pivot symbols
Extract the functions, classes, and types directly mentioned or implied by your question. These are the pivots.
- Traverse outward along the graph
From each pivot, follow edges:
- What do these pivots call?
- What calls them?
- What types do they use or return?
- Apply depth limits
Don’t traverse to depth 10. In practice, 2–3 hops from the pivots captures the essential context for most tasks.
- Compress intelligently
- For pivots and very close neighbors: include full bodies
- For more distant context: include signatures, type definitions, or summaries only
The result:
- Instead of 15 files, you might include 4
- Instead of 45,000 tokens, you might include 8,000
- The agent sees exactly what matters, with far less noise
Three Techniques to Reduce Over-Reading Right Now
You don’t need a full-blown context engine to start improving file selection. Here are three practical techniques you can use immediately.
1. Specify which files matter
When you give a task, explicitly constrain the scope:
“The bug is insrc/auth/session.ts. It callsvalidateTokeninsrc/auth/jwt.ts. Both files are relevant; nothing else should be needed.”
For tasks you understand reasonably well, this is ~20 seconds of typing that can eliminate 10+ unnecessary file reads.
2. Use file-level, not directory-level context
When using @mentions or file references with Claude Code (or similar tools), reference specific files, not directories.
- ✅
@src/auth/session.ts - ✅
@src/auth/jwt.ts - ❌
@src/auth/
Referencing a directory usually triggers a directory listing followed by broad file reads. Referencing specific files keeps the agent focused.
3. Set explicit constraints
Agents respond well to explicit scope boundaries. For example:
“The issue is isolated to processPayment. Please don’t read files outside of the payment module unless you have a specific, stated reason.”Or:
“You may read at most 3 additional files beyond the ones I’ve provided. If you think you need more, explain why first.”
These constraints nudge the agent away from reflexive over-reading and toward more deliberate navigation.
For more patterns like this, see:
How to Give Your AI Coding Agent Better Context (Automatically)
The Automated Fix: Graph-Based Context
Manual techniques help when you already understand your codebase. But for:
- Debugging unfamiliar or legacy code
- Working in large, inherited monorepos
- Refactoring cross-cutting concerns
…you need automated, graph-based context selection.
Here’s how that works.
Indexing phase
A code graph is built from your repository:
- Every function, class, and type becomes a node
- Every call, import, and type reference becomes an edge
This indexing runs once and then updates incrementally as your code changes.
Query phase
When you ask a question, the system:
- Parses your query to identify pivot symbols (e.g.
processPayment,UserSession,PaymentError) - Maps those pivots into the graph
- Traverses outward along call, import, and type edges
High-centrality nodes (code that is widely used or frequently called) are prioritized, because they’re more likely to be relevant to behavior.
Filtering phase
The collected symbols are then filtered and ranked:
- Nodes more than 3 hops from any pivot are usually excluded
- Nodes that are rarely used or clearly unrelated are deprioritized
The output is a precise, token-efficient context slice: the minimal set of symbols that captures the behavior you’re asking about.
Delivery phase
Finally, the context slice is formatted for the AI agent:
- Pivots: full code bodies
- Close neighbors: full bodies or detailed snippets
- Distant but important nodes: signatures, type definitions, or short summaries
- Everything else: omitted
This entire process runs in under a second. The agent never needs to “spray and pray” across your filesystem because it’s operating on a precomputed map of your code.
Benchmark: Targeted vs. Broad File Reading
On a FastAPI application with 7 representative coding tasks, we compared broad reading vs. graph-based selection.
Broad reading (no graph):
- Average files read per task: 11.3
- Average input tokens: ~48,000
- Task completion time: baseline
- Task success rate: baseline
Graph-based selection:
- Average files read per task: 3.8
- Average input tokens: ~14,000 (65–70% fewer)
- Task completion time: 22% faster
- Task success rate: higher (fewer hallucinations from noisy context)
The key finding: quality improved even though the agent saw less code. Targeted context beats broad context on cost, latency, and correctness.
For a deeper breakdown of token savings and patterns, see:
How to Reduce Claude Code Token Usage by 58%
The Long-Term Cost of the Status Quo
Over-reading compounds over time:
- Every unnecessary file adds tokens
- More tokens increase cost and dilute context
- Diluted context leads to weaker answers and more iterations
- More iterations trigger more file reads
A team of 5 developers with heavy AI usage and unconstrained file reading can easily spend $2,000–$5,000/month on unnecessary token usage alone.
This isn’t a problem you can “discipline” your way out of. As long as the agent lacks a code graph, it will keep guessing which files to read.
The fix is architectural, not behavioral:
- Give the agent a dependency graph
- Let it select context via graph traversal
- Use manual constraints as a safety rail, not the primary mechanism
FAQ
Can’t the agent just get better at picking files?
Models have improved at code navigation, but without a pre-built graph they face a fundamental limit: they must read something to know whether it’s relevant.
A pre-built graph breaks that loop by letting the agent reason over structure without first ingesting every file.
Does over-reading affect quality or just cost?
Both.
Over-reading is like reading 15 books to answer a question that’s fully answered in 3 of them. You can find the answer, but:
- It takes longer
- You’re more likely to get distracted by irrelevant details
- You may misinterpret the key signal
The same thing happens with large, noisy contexts.
How is this different from just having a good codebase structure?
Good directory and module structure helps, but it’s not enough.
Real-world codebases have:
- Cross-cutting concerns (logging, auth, metrics)
- Shared utilities and helpers
- Types and interfaces reused across modules
These relationships don’t always align with directory boundaries. A graph captures actual dependencies, not just organizational intent.
What if my codebase is small?
For repos under ~100 files, over-reading is less painful:
- Token costs are lower
- The risk of severe noise is smaller
But as your codebase grows, the cost curve steepens. Starting with graph-based context early helps you avoid expensive habits later.
Does this work for polyglot repos?
Yes, if your graph engine supports multiple languages.
For example, vexp supports TypeScript, JavaScript, Python, Go, Rust, Java, C#, and others. Mixed-language repos benefit from cross-language dependency tracking — e.g. tracing a request from a TypeScript frontend through a Python backend into a Go microservice.
Frequently Asked Questions
Why does my AI coding agent read so many files?
How many files does an AI agent actually need per task?
Does reading too many files affect AI coding quality?
What is the fix for excessive file reading?
Can I control which files Claude Code reads?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Claude Code Pro vs Max vs API: Which Plan Actually Saves Money
Data-driven breakdown of Claude Code pricing: Pro $20, Max $100-200, and API pay-per-token. Which plan costs less depends on your usage and token efficiency.

'Claude Code Spending Too Much' — Fixing the #1 Developer Complaint
Why Claude Code feels expensive, what actually drives token usage, and concrete steps (with numbers) to cut your monthly bill by 30–60%.

How to Reduce Claude Code API Costs for Your Engineering Team
Team-scale Claude Code costs multiply individual inefficiencies 8-15x. Here's the playbook: shared context engine, standardized CLAUDE.md, per-developer keys, and the actual ROI math.