90% of Claude Code Tokens Are Wasted on Exploration — Here's Proof

90% of Claude Code Tokens Are Wasted on Exploration — Here's Proof
If you lean on Claude Code for serious work, most of your tokens are not doing what you think they are. They’re not spent on reasoning about your problem or editing the right files. They’re paying for the agent to wander your codebase and drag around a bloated chat history.
This isn’t a quirk of Claude; it’s a structural property of any coding agent that:
- Lacks a pre-built code graph or index
- Navigates the filesystem on demand
- Accumulates full conversational history in every call
Below is how that waste shows up, why it degrades quality (not just cost), and what you can do to claw back both.
The Exploration Tax
When you ask Claude Code to debug or implement anything non-trivial, it starts blind. Without a dependency graph, the only viable strategy is incremental exploration:
- Read a file or directory to get bearings
- Notice a reference to another file
- Open that file
- Follow more references
- Repeat until it probably has enough context
This is rational behavior in the absence of structure, but it’s extremely token-hungry.
For a moderately complex task in a typical mid-sized codebase, 60–80% of input tokens on each exploratory turn can be consumed just by reading files that turn out to be marginal or irrelevant.
A Concrete Example: Checkout Bug
Developer question:
"Why is the checkout flow failing for users with stored payment methods?"
Without a code graph, a plausible exploration path might look like this:
- Read
src/checkout/directory listing → ~200 tokens - Read
checkout.tsmain file → ~3,500 tokens - Discover it calls
PaymentService→ readpayment-service.ts→ ~4,000 tokens - Discover it calls
UserPaymentMethods→ readuser-payment-methods.ts→ ~2,800 tokens - Suspect auth issue → read
auth/session.ts→ ~1,900 tokens - Check API handler → read
api/checkout.ts→ ~2,100 tokens - Inspect logging → read
lib/logger.ts→ ~1,200 tokens - Check types → read
types/payment.ts→ ~800 tokens
Total exploration tokens: ~16,500
Tokens truly relevant to the bug: maybe ~6,000 (3–4 files near the actual issue)
Exploration overhead: ~63%
And this ignores prior conversation history. In a real session, you might already be dragging 20,000–40,000 tokens of previous turns.
By the time the model actually reasons about the bug, it may be operating over 36,000+ tokens where <20% is signal.
The History Tax
Exploration is only half the story. The other half is history accumulation.
Most coding sessions are iterative:
- You ask a question
- Claude reads files, proposes a change
- You refine, correct, and iterate
Each turn’s content is appended to the context for the next call. That means every new request includes:
- The system prompt
- The full conversation so far
- Any files previously pasted or read
In a long session, token usage per call can look like this:
| Turn | Tokens Sent |
|------|-------------|
| 1 | 5,000 |
| 5 | 22,000 |
| 10 | 48,000 |
| 20 | 95,000 |
| 30 | 140,000+ |
By turn 30, you might send 140,000 tokens per API call. The new question at turn 30 might be 500 tokens. The other 139,500 tokens are history, most of which is no longer relevant.
That’s 99.6% overhead on that call.
Even in shorter, more disciplined sessions, it’s common for history overhead to exceed 50% after a handful of turns.
See also: Context Rot: Why Claude Code Gets Worse the Longer You Chat
Putting It Together: Total Waste Estimate
For a heavy Claude Code session (20+ turns) on a complex task, you typically pay two taxes simultaneously:
- Exploration overhead:
60–70% of tokens in file-reading turns go to exploring files that aren’t directly relevant.
- History overhead:
Starts around ~30% by turn 5, grows to ~90%+ by turn 20 as old context accumulates.
Combine them and you get a blended waste estimate:
- Typical heavy session: 70–85% of tokens are not directly relevant to the current question
- Aggressive headline scenario (long, exploratory, multi-problem sessions): 75–90% waste is realistic
In a FastAPI benchmark where a graph-based context engine was introduced, we measured a 65–70% reduction in tokens while improving answer quality. Those 65–70% of tokens were not helping; they were pure overhead.
For practical techniques, see: How to Reduce Claude Code Token Usage by 58%
Why Token Waste Hurts More Than Your Bill
This isn’t just a cost optimization problem. Excess tokens actively degrade model performance.
1. Noise Degrades Accuracy
When 70–80K of the tokens in context are irrelevant, the model must:
- Parse and embed a large amount of noise
- Maintain internal consistency across conflicting or outdated snippets
- Guess which parts of the context you still care about
This leads to:
- Higher hallucination risk
- Longer, more hedged answers
- Occasional reliance on stale or superseded information
2. History Creates Contradictions
Long sessions often contain:
- Old approaches you’ve abandoned
- Outdated code that’s since been changed
- Partial refactors that were never completed
If all of that remains in context, Claude can:
- Mix old and new designs in its reasoning
- Suggest patterns you explicitly rejected 10 turns ago
- Re-open dead ends because they’re still visible in the prompt
3. Exploration Burns the Context Window
Every file read consumes part of the context window. After 10–15 large files, you’re often at half or more of the available window before the model has even started serious reasoning.
Consequences:
- Less room for new code and explanations later in the session
- More aggressive truncation of earlier, possibly important details
- Subtle “context rot” as the model loses sight of the original problem
What Reduces Waste the Most
Here’s what actually moves the needle, ranked by impact.
1. Graph-Based Context Extraction (Highest Impact)
Problem: Without a code graph, Claude must explore the filesystem dynamically.
Solution: Build a dependency-aware index of your codebase and feed Claude only the relevant slices.
With a code graph or context engine, you can:
- Identify the minimal set of files relevant to a query
- Include only the pivots (directly referenced files) and their neighbors
- Avoid directory walks and speculative file reads entirely
In practice, this can shrink context from 40,000+ tokens of broad exploration to 8,000–15,000 tokens of high-relevance content.
This is the architectural fix. It doesn’t just reduce cost; it improves signal-to-noise and answer quality.
For deeper patterns, see: Context Engineering for AI Coding Agents
2. Fresh Sessions for New Problems
Rule of thumb: New task = new session.
Instead of:
- Using one mega-thread for “all backend work this week”
Prefer:
- One session per bug or feature
- Short, focused conversations that end when the task ends
This keeps history overhead low and prevents context rot.
3. Precise File References
When you already know where the problem likely lives, point Claude directly at it.
Frequently Asked Questions
Is it true that 90% of Claude Code tokens are wasted on exploration?
Why does Claude Code explore so many irrelevant files?
How is token waste measured in AI coding sessions?
What is the impact of wasted tokens beyond cost?
How can I eliminate exploration waste in Claude Code?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide
Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

Windsurf Credits Running Out? How to Use Fewer Tokens Per Task
Windsurf credits deplete fast because the AI processes too much irrelevant context. Reduce what it needs to read and your credits last 2-3x longer.

Best AI Coding Tool for Startups: Balancing Cost, Speed, and Quality
Startups need speed and budget control. The ideal AI coding stack combines a free/cheap agent with context optimization — here's how to set it up.