Why Claude Code Burns Through Tokens So Fast (And How to Fix It)

Why Claude Code Burns Through Tokens So Fast (And How to Fix It)
You add Claude Code to your workflow with reasonable expectations. A few hundred dollars a month, maybe. A month later you're looking at a bill that's three times what you budgeted, and you're not sure what happened.
You're not alone. This is probably the most common complaint I see from developers who use Claude Code heavily: the token burn rate feels disconnected from the actual work being done. You make 50 changes and somehow used $80 worth of API.
Here's what's actually happening — and what you can do about it.
The Core Problem: Context Is Cumulative
Every API call to Claude includes your entire conversation history plus whatever new content you add. This isn't a design flaw — it's how the model maintains context across a session. But it has a compounding effect that most developers underestimate.
Let's say you have a session where:
- Turn 1: 2,000 tokens (system prompt + first question)
- Turn 2: 4,000 tokens (turn 1 + model response + your follow-up)
- Turn 3: 7,000 tokens (turn 2 + model response + your follow-up)
- Turn 4: 11,000 tokens
- …
After 10 turns, you might be sending 40,000+ tokens per request — even though your actual question is 200 words. The conversation history is doing most of the work.
Most developers think of token cost as:
(length of my question) × (cost per token)
The reality is:
(total accumulated context) × (cost per token), every single turn.
What's Actually In Your Context
Let's break down where tokens go in a typical Claude Code session:
System prompt
2,000–5,000 tokens for a typical setup (instructions, project context, CLAUDE.md if you have one).
File content
This is the killer. If you paste a 200-line TypeScript file into context, that's roughly 2,000–3,000 tokens. If you reference 5 files across a session, you're carrying 10,000–15,000 tokens of file content.
Conversation history
Every back-and-forth adds up. A 20-message session might accumulate 30,000–50,000 tokens just in history.
Model responses
Claude's outputs are typically longer than your inputs — detailed explanations, full code blocks, suggested refactors. These count as output tokens, which are priced higher (5x more for Claude Sonnet 3.5 compared to input).
Error outputs
Stack traces, test failures, build logs. These can be 500–2,000 tokens each, and if you paste the same error multiple times, it multiplies.
The Specific Behaviors That Burn Tokens Fast
Not all usage patterns are equally expensive. These five behaviors drive the highest token costs:
1. Starting every session by dumping your codebase
Some developers have a habit of pasting large files "for context" at the start of every session. If you do this with 3–4 files before even asking a question, you've spent 10,000+ tokens before any work happens. And those tokens get resent with every subsequent turn.
2. Long exploratory sessions without clearing
An exploratory session — "let me understand this codebase" — where you're just asking questions and getting explanations can accumulate massive context. You're paying for the history of the exploration on every turn, even when you move to implementation.
3. Iterating with the full file in context
"Here's the full file, please modify line 47." The model needs to see line 47 and its surrounding context. It probably doesn't need lines 1–200. Sending the full 300-line file to change one function costs 3x more than sending just that function.
4. Asking Claude to read files it already read
In tool-enabled setups, asking Claude to re-read files it already processed in the current session re-adds those files to the accumulating context. Each re-read is additive.
5. Not starting fresh sessions for new problems
When you finish one task and start another in the same session, you carry all the context from the first task into the second. That context was useful for task 1 but is pure noise for task 2 — noise you're paying for on every turn.
How Much Context Is Wasted?
In a typical Claude Code workflow, a significant portion of input tokens going into each API call are not relevant to the current question.
If you're 30 messages into a session working on a payment module, and you ask a question about your auth system, maybe 20% of the context (recent payment module work) is even tangentially relevant. The other 80% — the earlier exploration, the abandoned approaches, the unrelated fixes — is payload you're carrying but not using.
See also: The Token Waste Problem: 80% of AI Coding Tokens Are Irrelevant.
The Fix: Precision Context
The solution isn't to give Claude less context — it's to give Claude the right context. There's a meaningful difference:
Too little context
"Fix line 47." The model doesn't know enough to make a correct fix.
Too much context
Your entire 30-message session history plus 5 pasted files. The model gets the right answer buried in noise, but you paid for 100,000 tokens.
Precision context
The specific function and its direct dependencies (callers, callees, types). ~8,000 tokens, high relevance. The model has what it needs without the noise.
The key insight: a 65–70% reduction in input tokens doesn't mean giving the model less information. It means giving it the right information, extracted from your codebase graph.
See: How to Reduce Claude Code Token Usage by 58%.
Practical Changes You Can Make Right Now
1. Reset your session for new problems
When you finish a task and start a different one, open a new chat. It takes a few seconds and can cut the token cost of the new task in half by removing accumulated irrelevant history.
2. Be specific about what you paste
Instead of pasting an entire file, paste just the function or class you're working on. Add a brief description, for example:
This is theprocessPaymentfunction. It callsvalidateCard(which validates Stripe cards) andchargeStripe(handles API call). I need to add retry logic.
That's ~200 tokens instead of 3,000 tokens for the full file, and the model has the context it needs.
3. Use the /clear command strategically
In Claude Code, /clear resets the conversation history without closing the session. Use it when you've finished a sub-task and want to start fresh without starting a completely new session.
4. Watch for file-reading loops
If you notice Claude reads the same file multiple times in a session, that's a sign the context isn't being managed well. Each read adds tokens. Use targeted file references instead of repeated reads.
5. Set token budget alerts
Most API dashboards let you set spending alerts. Set one at your expected monthly budget and another at 1.5x. Alerts surface burning behavior before it becomes a problem, not after.
The Automated Approach
Manual discipline helps, but it's friction. The more sustainable fix is automated context extraction that solves the problem architecturally.
Graph-based context engines index your codebase and extract exactly the relevant code for each query. Instead of session history, the model gets a fresh, precise slice of your codebase every time. No accumulated noise, no stale file dumps.
In benchmarked tests using a FastAPI codebase (7 representative tasks, 21 runs per condition, Claude Sonnet 3.5): automated context extraction achieved ~58% cost reduction with ~63% fewer output tokens and ~22% faster task completion. The reduction came from providing less noise, which allowed the model to produce more precise, shorter answers.
See: Context Engineering for AI Coding Agents.
Understanding Your Claude Bill
Quick reference for Claude Sonnet 3.5 pricing (as of late 2024):
| Component | Price |
|------------------|----------------|
| Input tokens | $3 / 1M tokens |
| Output tokens | $15 / 1M tokens|
| Typical session input | 30,000–80,000 tokens |
| Typical session output | 5,000–15,000 tokens |
| Cost per heavy session | $0.17–$0.47 |
A developer doing 20 heavy sessions per day: $3.40–$9.40/day, or roughly $70–$190/month. That's around where the "this is expensive" threshold sits for many teams.
With precision context, the same developer's input tokens can drop by 65–70%, bringing that to roughly $25–$80/month. Same work, lower bill.
FAQ
Q: Does using Claude Code on the Max plan change this?
The Max plan (e.g., $100/month) includes higher usage limits but uses the same per-token billing through the API. The billing mechanics are identical — context accumulation still drives cost. What changes is the monthly ceiling before you hit rate limits.
Q: Does longer context actually hurt quality?
Often yes, past a certain point. Research on the "lost in the middle" phenomenon shows models give less attention to content buried in the middle of very long contexts. More context isn't always better context.
Q: Is there a way to see exactly what's in Claude's context window?
In Claude Code, you can't directly inspect the full context, but you can estimate it from the conversation length and any files referenced. The API does report input token counts per call.
Q: Should I use Claude Haiku instead of Sonnet to save money?
For simple tasks and exploratory questions, Haiku can be much cheaper at comparable quality. For complex code generation and debugging, Sonnet's quality improvement usually justifies the cost. The bigger win is context efficiency, which reduces cost for any model tier.
Q: What about caching?
Claude supports prompt caching, which can reduce repeated content costs by ~90%. If your system prompt and project context are stable, caching them is high-value. But caching helps with the fixed overhead, not with the variable accumulation of conversation history.

// Example: naive vs precision context for a small change
// ❌ Naive: send entire file every time
// ~300 lines, ~3,000 tokens per turn
// ✅ Precision: send only what matters
interface ChargeResult {
success: boolean;
errorCode?: string;
}
async function processPayment(input: PaymentInput): Promise<ChargeResult> {
const validated = validateCard(input.card);
if (!validated.ok) {
return { success: false, errorCode: 'card_invalid' };
}
// TODO: add retry logic around chargeStripe
return chargeStripe(input);
}
// Prompt suggestion:
// "This is processPayment. It calls validateCard (Stripe card validation)
// and chargeStripe (Stripe API call). Add retry logic with exponential
// backoff around chargeStripe, up to 3 attempts."
Frequently Asked Questions
Why does Claude Code use so many tokens per session?
How much does a typical Claude Code session cost?
What behaviors burn through Claude Code tokens the fastest?
Does the Claude Code Max plan fix the token cost problem?
Does longer context actually hurt Claude Code's output quality?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide
Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

Windsurf Credits Running Out? How to Use Fewer Tokens Per Task
Windsurf credits deplete fast because the AI processes too much irrelevant context. Reduce what it needs to read and your credits last 2-3x longer.

Best AI Coding Tool for Startups: Balancing Cost, Speed, and Quality
Startups need speed and budget control. The ideal AI coding stack combines a free/cheap agent with context optimization — here's how to set it up.