How to Reduce Claude Code Token Usage by 58% (Without Manual Context Management)

Claude Code is great at reading your codebase—but terrible at stopping.
Ask it to fix a bug in auth, and it happily slurps in every nearby file: helpers, configs, unrelated models, and half the test suite. You pay for all of it, even if the fix lives in three functions.
This isn’t a prompt issue or a misconfiguration. It’s structural: Claude Code’s default context loader has no real understanding of your codebase’s dependency graph. It loads by proximity (directory, filename, loose heuristics), not by actual dependencies.
The result: in a 50K+ LOC production codebase, you routinely burn tens of thousands of tokens per task for a few hundred tokens of truly relevant code.
Below is a tested way to cut that waste by ~58%—without touching your prompts—by plugging a dependency-graph context engine (vexp) into Claude Code via MCP.
Why Claude Code Over-Reads Your Codebase
Claude Code’s default behavior is intentionally conservative: when you ask it to fix or add something, it tries to avoid missing any important context. In practice, that means:
- It loads entire directories instead of specific call chains
- It follows loose associations (e.g., any file that ever imported a related module)
- It re-reads the same architectural files across sessions
Take a simple example: fixing a bug in an authentication function.
- Truly relevant context: the auth function, its direct dependencies, and the test file that covers it — maybe 500–1,000 tokens.
- What Claude Code often loads: every file in the
authdirectory, shared utilities, configs, unrelated models — easily 40,000+ tokens.
Your relevant-token ratio is often around 2.5%. You’re paying for 40K tokens to use 1K.
What “58% Reduction” Actually Means
On a real FastAPI production codebase, we benchmarked Claude Sonnet on:
- 7 representative tasks:
Summary
Claude Code wastes tokens because it explores your codebase naively: it follows every import chain, reads whole files instead of relevant snippets, and has no persistent structural memory of what mattered before. This is a context selection and organization problem, not a model-quality problem.
A context engine like vexp fixes this by building and maintaining a dependency graph of your codebase (files, functions, classes, types, and their relationships). Instead of Claude Code reading 40+ files per task, vexp:
- Identifies relevant symbols from your task description
- Traverses the dependency graph (what your code calls and what calls it)
- Ranks nodes by importance/centrality
- Compresses context to only the necessary snippets
- Returns a focused "context capsule" for the model
In a FastAPI benchmark, this yielded:
- 58% lower API costs
- 65% fewer input tokens
- 14 percentage point higher completion rate
You can get some of these benefits manually (precise @mentions, smaller tasks, CLAUDE.md, fresh sessions), but they don’t scale and rely on you knowing the codebase deeply.
Automating with vexp via MCP lets Claude Code (and other agents) call run_pipeline to get pre-indexed, ranked, compressed context, dramatically reducing wasteful tokens while often improving answer quality.
Key Problems vexp Solves
- Context selection
Knowing which files, functions, and types matter before the model reads anything.
- Context ranking
Ordering snippets so the most important code appears first in the prompt.
- Context compression
Including only the relevant parts of each file instead of entire files.
These are collectively context engineering. Doing them manually is possible but brittle and time-consuming.
Manual Tactics (Baseline Improvements)
You can reduce Claude Code’s token usage today by:
- Specific
@mentions
- Bad:
Fix the payment processing bug - Better:
Fix the bug in PaymentProcessor.chargeCard() — @src/payments/PaymentProcessor.ts @src/types/Transaction.ts - Typical savings: 20–40% tokens for well-scoped tasks.
- Smaller, scoped tasks
- Break big refactors into concrete, function-level tasks.
CLAUDE.mdfor structure
- Document key modules and directories so Claude navigates faster.
- Fresh sessions
- Avoid long, drifting sessions where early context becomes ineffective.
Limitations: these depend on your knowledge, discipline, and ongoing maintenance; they don’t give the agent a structural map of your codebase.
Automated Approach: vexp Context Engine
vexp builds and maintains a dependency graph of your codebase:
- Nodes: files, functions, classes, types
- Edges: imports, calls, inheritance, references
When Claude Code receives a task, vexp’s run_pipeline:
- Parses the task description to find starting symbols (e.g.,
OrderController.processPayment). - Traverses the graph outward (dependencies and dependents).
- Scores and ranks nodes by relevance and connectivity.
- Extracts only the relevant snippets from those nodes.
- Returns a compact, ranked context bundle.
Result: the model sees fewer, more relevant tokens and usually performs better.
Where the 58% Savings Come From
- Input token reduction (~65%)
- Fewer files and only partial snippets per file.
- Example benchmark: ~85k → ~30k input tokens per task.
- Fewer exploration rounds
- The first response already has the right context, reducing back-and-forth.
- Higher completion rate (+14pp)
- Fewer failed attempts and retries (each retry doubles cost for that task).
- Session memory
- vexp remembers what was useful across sessions, so context selection improves over time.
When You’ll See the Biggest Gains
Larger savings if:
- Codebase is large (>200 files)
- Tasks span multiple modules
- Team uses Claude Code heavily (hundreds of interactions/day)
Frequently Asked Questions
How can I reduce Claude Code token usage without losing quality?
What is the main cause of excessive token usage in Claude Code?
Does using a context engine like vexp require changing my workflow?
How does graph-based context retrieval work in practice?
Is a 58% token reduction realistic for all types of projects?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Codex vs Claude: AI Coding Agents Compared 2026
Compare OpenAI Codex and Claude Code: cloud-sandboxed vs local-shell execution, security, token optimization, and which fits your workflow.

Claude vs Codex 2026: Which AI Coding Agent Wins?
Compare Claude Code vs OpenAI Codex for AI coding tasks. Local vs cloud execution, costs, security, and workflow fit explained.

Claude Code vs Codex: Which AI Coding Agent Wins in 2026?
Compare Claude Code vs Codex: benchmark scores, architecture, pricing, and which agentic coding tool fits your workflow best.