How to Reduce Claude Code Token Usage by 58% (Without Manual Context Management)

Nicola·
How to Reduce Claude Code Token Usage by 58% (Without Manual Context Management)

Claude Code is great at reading your codebase—but terrible at stopping.

Ask it to fix a bug in auth, and it happily slurps in every nearby file: helpers, configs, unrelated models, and half the test suite. You pay for all of it, even if the fix lives in three functions.

This isn’t a prompt issue or a misconfiguration. It’s structural: Claude Code’s default context loader has no real understanding of your codebase’s dependency graph. It loads by proximity (directory, filename, loose heuristics), not by actual dependencies.

The result: in a 50K+ LOC production codebase, you routinely burn tens of thousands of tokens per task for a few hundred tokens of truly relevant code.

Below is a tested way to cut that waste by ~58%—without touching your prompts—by plugging a dependency-graph context engine (vexp) into Claude Code via MCP.

Why Claude Code Over-Reads Your Codebase

Claude Code’s default behavior is intentionally conservative: when you ask it to fix or add something, it tries to avoid missing any important context. In practice, that means:

  • It loads entire directories instead of specific call chains
  • It follows loose associations (e.g., any file that ever imported a related module)
  • It re-reads the same architectural files across sessions

Take a simple example: fixing a bug in an authentication function.

  • Truly relevant context: the auth function, its direct dependencies, and the test file that covers it — maybe 500–1,000 tokens.
  • What Claude Code often loads: every file in the auth directory, shared utilities, configs, unrelated models — easily 40,000+ tokens.

Your relevant-token ratio is often around 2.5%. You’re paying for 40K tokens to use 1K.

What “58% Reduction” Actually Means

On a real FastAPI production codebase, we benchmarked Claude Sonnet on:

  • 7 representative tasks:

Summary

Claude Code wastes tokens because it explores your codebase naively: it follows every import chain, reads whole files instead of relevant snippets, and has no persistent structural memory of what mattered before. This is a context selection and organization problem, not a model-quality problem.

A context engine like vexp fixes this by building and maintaining a dependency graph of your codebase (files, functions, classes, types, and their relationships). Instead of Claude Code reading 40+ files per task, vexp:

  1. Identifies relevant symbols from your task description
  2. Traverses the dependency graph (what your code calls and what calls it)
  3. Ranks nodes by importance/centrality
  4. Compresses context to only the necessary snippets
  5. Returns a focused "context capsule" for the model

In a FastAPI benchmark, this yielded:

  • 58% lower API costs
  • 65% fewer input tokens
  • 14 percentage point higher completion rate

You can get some of these benefits manually (precise @mentions, smaller tasks, CLAUDE.md, fresh sessions), but they don’t scale and rely on you knowing the codebase deeply.

Automating with vexp via MCP lets Claude Code (and other agents) call run_pipeline to get pre-indexed, ranked, compressed context, dramatically reducing wasteful tokens while often improving answer quality.

Key Problems vexp Solves

  1. Context selection

Knowing which files, functions, and types matter before the model reads anything.

  1. Context ranking

Ordering snippets so the most important code appears first in the prompt.

  1. Context compression

Including only the relevant parts of each file instead of entire files.

These are collectively context engineering. Doing them manually is possible but brittle and time-consuming.

Manual Tactics (Baseline Improvements)

You can reduce Claude Code’s token usage today by:

  1. Specific @mentions
  • Bad: Fix the payment processing bug
  • Better: Fix the bug in PaymentProcessor.chargeCard() — @src/payments/PaymentProcessor.ts @src/types/Transaction.ts
  • Typical savings: 20–40% tokens for well-scoped tasks.
  1. Smaller, scoped tasks
  • Break big refactors into concrete, function-level tasks.
  1. CLAUDE.md for structure
  • Document key modules and directories so Claude navigates faster.
  1. Fresh sessions
  • Avoid long, drifting sessions where early context becomes ineffective.

Limitations: these depend on your knowledge, discipline, and ongoing maintenance; they don’t give the agent a structural map of your codebase.

Automated Approach: vexp Context Engine

vexp builds and maintains a dependency graph of your codebase:

  • Nodes: files, functions, classes, types
  • Edges: imports, calls, inheritance, references

When Claude Code receives a task, vexp’s run_pipeline:

  1. Parses the task description to find starting symbols (e.g., OrderController.processPayment).
  2. Traverses the graph outward (dependencies and dependents).
  3. Scores and ranks nodes by relevance and connectivity.
  4. Extracts only the relevant snippets from those nodes.
  5. Returns a compact, ranked context bundle.

Result: the model sees fewer, more relevant tokens and usually performs better.

Where the 58% Savings Come From

  1. Input token reduction (~65%)
  • Fewer files and only partial snippets per file.
  • Example benchmark: ~85k → ~30k input tokens per task.
  1. Fewer exploration rounds
  • The first response already has the right context, reducing back-and-forth.
  1. Higher completion rate (+14pp)
  • Fewer failed attempts and retries (each retry doubles cost for that task).
  1. Session memory
  • vexp remembers what was useful across sessions, so context selection improves over time.

When You’ll See the Biggest Gains

Larger savings if:

  • Codebase is large (>200 files)
  • Tasks span multiple modules
  • Team uses Claude Code heavily (hundreds of interactions/day)

Frequently Asked Questions

How can I reduce Claude Code token usage without losing quality?
Use a dependency graph context engine like vexp to replace broad file loading with targeted retrieval. Instead of Claude scanning dozens of files to find relevant code, graph traversal returns only the structurally connected files for your task. This eliminates 65-70% of input tokens while actually improving output quality by reducing noise.
What is the main cause of excessive token usage in Claude Code?
The primary cause is irrelevant context loading. Claude Code uses keyword search and file proximity heuristics to decide what to include, which consistently over-includes files. On a typical production codebase, 80-90% of the files loaded per session are not actually needed for the specific task at hand.
Does using a context engine like vexp require changing my workflow?
No. vexp runs as an MCP server that Claude Code automatically uses when you ask it to. You add vexp to your MCP configuration once, and every subsequent session automatically benefits from optimized context. There's no new tool to learn and no prompting strategies to adopt.
How does graph-based context retrieval work in practice?
When you describe a task to Claude Code, vexp intercepts the context request and performs graph traversal starting from code symbols relevant to your task. It follows import/call edges outward to 2-3 hops, ranks nodes by centrality and relevance, and returns only the highest-value code snippets within your token budget. The entire process takes milliseconds.
Is a 58% token reduction realistic for all types of projects?
The 58% figure is an average from benchmarks on production codebases. Projects with many loosely connected files (monorepos, microservices, large Python services) tend to see higher reductions (65-75%) because keyword search over-includes more. Smaller, tightly coupled codebases may see less benefit (40-50%), but the reduction is consistent across all codebase types tested.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Related Articles