90% of Claude Code Tokens Are Wasted on Exploration — Here's Proof

Nicola·
90% of Claude Code Tokens Are Wasted on Exploration — Here's Proof

90% of Claude Code Tokens Are Wasted on Exploration — Here's Proof

If you lean on Claude Code for serious work, most of your tokens are not doing what you think they are. They’re not spent on reasoning about your problem or editing the right files. They’re paying for the agent to wander your codebase and drag around a bloated chat history.

This isn’t a quirk of Claude; it’s a structural property of any coding agent that:

  • Lacks a pre-built code graph or index
  • Navigates the filesystem on demand
  • Accumulates full conversational history in every call

Below is how that waste shows up, why it degrades quality (not just cost), and what you can do to claw back both.

The Exploration Tax

When you ask Claude Code to debug or implement anything non-trivial, it starts blind. Without a dependency graph, the only viable strategy is incremental exploration:

  1. Read a file or directory to get bearings
  2. Notice a reference to another file
  3. Open that file
  4. Follow more references
  5. Repeat until it probably has enough context

This is rational behavior in the absence of structure, but it’s extremely token-hungry.

For a moderately complex task in a typical mid-sized codebase, 60–80% of input tokens on each exploratory turn can be consumed just by reading files that turn out to be marginal or irrelevant.

A Concrete Example: Checkout Bug

Developer question:

"Why is the checkout flow failing for users with stored payment methods?"

Without a code graph, a plausible exploration path might look like this:

  1. Read src/checkout/ directory listing → ~200 tokens
  2. Read checkout.ts main file → ~3,500 tokens
  3. Discover it calls PaymentService → read payment-service.ts → ~4,000 tokens
  4. Discover it calls UserPaymentMethods → read user-payment-methods.ts → ~2,800 tokens
  5. Suspect auth issue → read auth/session.ts → ~1,900 tokens
  6. Check API handler → read api/checkout.ts → ~2,100 tokens
  7. Inspect logging → read lib/logger.ts → ~1,200 tokens
  8. Check types → read types/payment.ts → ~800 tokens

Total exploration tokens: ~16,500

Tokens truly relevant to the bug: maybe ~6,000 (3–4 files near the actual issue)

Exploration overhead: ~63%

And this ignores prior conversation history. In a real session, you might already be dragging 20,000–40,000 tokens of previous turns.

By the time the model actually reasons about the bug, it may be operating over 36,000+ tokens where <20% is signal.

The History Tax

Exploration is only half the story. The other half is history accumulation.

Most coding sessions are iterative:

  • You ask a question
  • Claude reads files, proposes a change
  • You refine, correct, and iterate

Each turn’s content is appended to the context for the next call. That means every new request includes:

  • The system prompt
  • The full conversation so far
  • Any files previously pasted or read

In a long session, token usage per call can look like this:

| Turn | Tokens Sent |

|------|-------------|

| 1 | 5,000 |

| 5 | 22,000 |

| 10 | 48,000 |

| 20 | 95,000 |

| 30 | 140,000+ |

By turn 30, you might send 140,000 tokens per API call. The new question at turn 30 might be 500 tokens. The other 139,500 tokens are history, most of which is no longer relevant.

That’s 99.6% overhead on that call.

Even in shorter, more disciplined sessions, it’s common for history overhead to exceed 50% after a handful of turns.

See also: Context Rot: Why Claude Code Gets Worse the Longer You Chat

Putting It Together: Total Waste Estimate

For a heavy Claude Code session (20+ turns) on a complex task, you typically pay two taxes simultaneously:

  • Exploration overhead:

60–70% of tokens in file-reading turns go to exploring files that aren’t directly relevant.

  • History overhead:

Starts around ~30% by turn 5, grows to ~90%+ by turn 20 as old context accumulates.

Combine them and you get a blended waste estimate:

  • Typical heavy session: 70–85% of tokens are not directly relevant to the current question
  • Aggressive headline scenario (long, exploratory, multi-problem sessions): 75–90% waste is realistic

In a FastAPI benchmark where a graph-based context engine was introduced, we measured a 65–70% reduction in tokens while improving answer quality. Those 65–70% of tokens were not helping; they were pure overhead.

For practical techniques, see: How to Reduce Claude Code Token Usage by 58%

Why Token Waste Hurts More Than Your Bill

This isn’t just a cost optimization problem. Excess tokens actively degrade model performance.

1. Noise Degrades Accuracy

When 70–80K of the tokens in context are irrelevant, the model must:

  • Parse and embed a large amount of noise
  • Maintain internal consistency across conflicting or outdated snippets
  • Guess which parts of the context you still care about

This leads to:

  • Higher hallucination risk
  • Longer, more hedged answers
  • Occasional reliance on stale or superseded information

2. History Creates Contradictions

Long sessions often contain:

  • Old approaches you’ve abandoned
  • Outdated code that’s since been changed
  • Partial refactors that were never completed

If all of that remains in context, Claude can:

  • Mix old and new designs in its reasoning
  • Suggest patterns you explicitly rejected 10 turns ago
  • Re-open dead ends because they’re still visible in the prompt

3. Exploration Burns the Context Window

Every file read consumes part of the context window. After 10–15 large files, you’re often at half or more of the available window before the model has even started serious reasoning.

Consequences:

  • Less room for new code and explanations later in the session
  • More aggressive truncation of earlier, possibly important details
  • Subtle “context rot” as the model loses sight of the original problem

What Reduces Waste the Most

Here’s what actually moves the needle, ranked by impact.

1. Graph-Based Context Extraction (Highest Impact)

Problem: Without a code graph, Claude must explore the filesystem dynamically.

Solution: Build a dependency-aware index of your codebase and feed Claude only the relevant slices.

With a code graph or context engine, you can:

  • Identify the minimal set of files relevant to a query
  • Include only the pivots (directly referenced files) and their neighbors
  • Avoid directory walks and speculative file reads entirely

In practice, this can shrink context from 40,000+ tokens of broad exploration to 8,000–15,000 tokens of high-relevance content.

This is the architectural fix. It doesn’t just reduce cost; it improves signal-to-noise and answer quality.

For deeper patterns, see: Context Engineering for AI Coding Agents

2. Fresh Sessions for New Problems

Rule of thumb: New task = new session.

Instead of:

  • Using one mega-thread for “all backend work this week”

Prefer:

  • One session per bug or feature
  • Short, focused conversations that end when the task ends

This keeps history overhead low and prevents context rot.

3. Precise File References

When you already know where the problem likely lives, point Claude directly at it.

Frequently Asked Questions

Is it true that 90% of Claude Code tokens are wasted on exploration?
In typical unoptimized sessions on mid-to-large codebases, 65-90% of input tokens go to files and functions that are not directly referenced in the model's response. The exact percentage depends on codebase size and how broadly the agent searches. Larger codebases with more files tend to see higher waste ratios.
Why does Claude Code explore so many irrelevant files?
Claude Code explores broadly because it doesn't have a pre-built map of your codebase's dependency structure. When you ask about a function, it may search for it across dozens of files, loading each one into context. Without a dependency graph, it can't know which files are actually connected to your task.
How is token waste measured in AI coding sessions?
Token waste is measured by comparing which files were loaded into context versus which files' content actually appeared in or influenced the model's response. Files that were loaded but never referenced represent wasted tokens. This can be tracked by analyzing API logs against the generated output.
What is the impact of wasted tokens beyond cost?
Wasted tokens cause three problems beyond cost: they fill the context window faster (leading to earlier context compression or truncation), they can confuse the model by presenting irrelevant patterns it might incorrectly follow, and they slow down response time since the model must attend to all tokens in context.
How can I eliminate exploration waste in Claude Code?
Use a context engine like vexp that pre-indexes your codebase into a dependency graph. Instead of Claude Code searching broadly and loading full files, vexp's run_pipeline identifies the exact functions and call sites connected to your task via graph traversal — reducing exploration waste from 65-90% down to under 30%.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Related Articles