How to Reduce Claude Code Token Usage by 58% (Without Manual Context Management)

Claude Code is great at reading your codebase—but terrible at stopping.
Ask it to fix a bug in auth, and it happily slurps in every nearby file: helpers, configs, unrelated models, and half the test suite. You pay for all of it, even if the fix lives in three functions.
This isn’t a prompt issue or a misconfiguration. It’s structural: Claude Code’s default context loader has no real understanding of your codebase’s dependency graph. It loads by proximity (directory, filename, loose heuristics), not by actual dependencies.
The result: in a 50K+ LOC production codebase, you routinely burn tens of thousands of tokens per task for a few hundred tokens of truly relevant code.
Below is a tested way to cut that waste by ~58%—without touching your prompts—by plugging a dependency-graph context engine (vexp) into Claude Code via MCP.
Why Claude Code Over-Reads Your Codebase
Claude Code’s default behavior is intentionally conservative: when you ask it to fix or add something, it tries to avoid missing any important context. In practice, that means:
- It loads entire directories instead of specific call chains
- It follows loose associations (e.g., any file that ever imported a related module)
- It re-reads the same architectural files across sessions
Take a simple example: fixing a bug in an authentication function.
- Truly relevant context: the auth function, its direct dependencies, and the test file that covers it — maybe 500–1,000 tokens.
- What Claude Code often loads: every file in the
authdirectory, shared utilities, configs, unrelated models — easily 40,000+ tokens.
Your relevant-token ratio is often around 2.5%. You’re paying for 40K tokens to use 1K.
What “58% Reduction” Actually Means
On a real FastAPI production codebase, we benchmarked Claude Sonnet on:
- 7 representative tasks:
Summary
Claude Code wastes tokens because it explores your codebase naively: it follows every import chain, reads whole files instead of relevant snippets, and has no persistent structural memory of what mattered before. This is a context selection and organization problem, not a model-quality problem.
A context engine like vexp fixes this by building and maintaining a dependency graph of your codebase (files, functions, classes, types, and their relationships). Instead of Claude Code reading 40+ files per task, vexp:
- Identifies relevant symbols from your task description
- Traverses the dependency graph (what your code calls and what calls it)
- Ranks nodes by importance/centrality
- Compresses context to only the necessary snippets
- Returns a focused "context capsule" for the model
In a FastAPI benchmark, this yielded:
- 58% lower API costs
- 65% fewer input tokens
- 14 percentage point higher completion rate
You can get some of these benefits manually (precise @mentions, smaller tasks, CLAUDE.md, fresh sessions), but they don’t scale and rely on you knowing the codebase deeply.
Automating with vexp via MCP lets Claude Code (and other agents) call run_pipeline to get pre-indexed, ranked, compressed context, dramatically reducing wasteful tokens while often improving answer quality.
Key Problems vexp Solves
- Context selection
Knowing which files, functions, and types matter before the model reads anything.
- Context ranking
Ordering snippets so the most important code appears first in the prompt.
- Context compression
Including only the relevant parts of each file instead of entire files.
These are collectively context engineering. Doing them manually is possible but brittle and time-consuming.
Manual Tactics (Baseline Improvements)
You can reduce Claude Code’s token usage today by:
- Specific
@mentions
- Bad:
Fix the payment processing bug - Better:
Fix the bug in PaymentProcessor.chargeCard() — @src/payments/PaymentProcessor.ts @src/types/Transaction.ts - Typical savings: 20–40% tokens for well-scoped tasks.
- Smaller, scoped tasks
- Break big refactors into concrete, function-level tasks.
CLAUDE.mdfor structure
- Document key modules and directories so Claude navigates faster.
- Fresh sessions
- Avoid long, drifting sessions where early context becomes ineffective.
Limitations: these depend on your knowledge, discipline, and ongoing maintenance; they don’t give the agent a structural map of your codebase.
Automated Approach: vexp Context Engine
vexp builds and maintains a dependency graph of your codebase:
- Nodes: files, functions, classes, types
- Edges: imports, calls, inheritance, references
When Claude Code receives a task, vexp’s run_pipeline:
- Parses the task description to find starting symbols (e.g.,
OrderController.processPayment). - Traverses the graph outward (dependencies and dependents).
- Scores and ranks nodes by relevance and connectivity.
- Extracts only the relevant snippets from those nodes.
- Returns a compact, ranked context bundle.
Result: the model sees fewer, more relevant tokens and usually performs better.
Where the 58% Savings Come From
- Input token reduction (~65%)
- Fewer files and only partial snippets per file.
- Example benchmark: ~85k → ~30k input tokens per task.
- Fewer exploration rounds
- The first response already has the right context, reducing back-and-forth.
- Higher completion rate (+14pp)
- Fewer failed attempts and retries (each retry doubles cost for that task).
- Session memory
- vexp remembers what was useful across sessions, so context selection improves over time.
When You’ll See the Biggest Gains
Larger savings if:
- Codebase is large (>200 files)
- Tasks span multiple modules
- Team uses Claude Code heavily (hundreds of interactions/day)
Frequently Asked Questions
How can I reduce Claude Code token usage without losing quality?
What is the main cause of excessive token usage in Claude Code?
Does using a context engine like vexp require changing my workflow?
How does graph-based context retrieval work in practice?
Is a 58% token reduction realistic for all types of projects?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Claude Code Has No Session Memory — Here's How to Add It
Claude Code is stateless between sessions. Learn how to add scalable, code-linked session memory using CLAUDE.md and vexp.

Context Window Management for AI Coding: The Developer's Guide
Learn how AI context windows work, why long coding sessions degrade, and practical strategies and tools like vexp to keep Claude effective and costs low.

Cursor vs Claude Code vs Copilot 2026: The Only Comparison You Need
A practical 2026 comparison of GitHub Copilot, Cursor, and Claude Code based on real production use, with a focus on context, agentic workflows, and pricing.