Why Claude Code Burns Through Tokens So Fast (And How to Fix It)

Nicola·March 16, 2026

Why Claude Code Burns Through Tokens So Fast (And How to Fix It)

You add Claude Code to your workflow with reasonable expectations. A few hundred dollars a month, maybe. A month later you're looking at a bill that's three times what you budgeted, and you're not sure what happened.

You're not alone. This is probably the most common complaint I see from developers who use Claude Code heavily: the token burn rate feels disconnected from the actual work being done. You make 50 changes and somehow used $80 worth of API.

Here's what's actually happening — and what you can do about it.

The Core Problem: Context Is Cumulative

Every API call to Claude includes your entire conversation history plus whatever new content you add. This isn't a design flaw — it's how the model maintains context across a session. But it has a compounding effect that most developers underestimate.

Let's say you have a session where:

Turn 1: 2,000 tokens (system prompt + first question)
Turn 2: 4,000 tokens (turn 1 + model response + your follow-up)
Turn 3: 7,000 tokens (turn 2 + model response + your follow-up)
Turn 4: 11,000 tokens
…

After 10 turns, you might be sending 40,000+ tokens per request — even though your actual question is 200 words. The conversation history is doing most of the work.

Most developers think of token cost as:

(length of my question) × (cost per token)

The reality is:

(total accumulated context) × (cost per token), every single turn.

What's Actually In Your Context

Let's break down where tokens go in a typical Claude Code session:

System prompt

2,000–5,000 tokens for a typical setup (instructions, project context, CLAUDE.md if you have one).

File content

This is the killer. If you paste a 200-line TypeScript file into context, that's roughly 2,000–3,000 tokens. If you reference 5 files across a session, you're carrying 10,000–15,000 tokens of file content.

Conversation history

Every back-and-forth adds up. A 20-message session might accumulate 30,000–50,000 tokens just in history.

Model responses

Claude's outputs are typically longer than your inputs — detailed explanations, full code blocks, suggested refactors. These count as output tokens, which are priced higher (5x more for Claude Sonnet 3.5 compared to input).

Error outputs

Stack traces, test failures, build logs. These can be 500–2,000 tokens each, and if you paste the same error multiple times, it multiplies.

The Specific Behaviors That Burn Tokens Fast

Not all usage patterns are equally expensive. These five behaviors drive the highest token costs:

1. Starting every session by dumping your codebase

Some developers have a habit of pasting large files "for context" at the start of every session. If you do this with 3–4 files before even asking a question, you've spent 10,000+ tokens before any work happens. And those tokens get resent with every subsequent turn.

2. Long exploratory sessions without clearing

An exploratory session — "let me understand this codebase" — where you're just asking questions and getting explanations can accumulate massive context. You're paying for the history of the exploration on every turn, even when you move to implementation.

3. Iterating with the full file in context

"Here's the full file, please modify line 47." The model needs to see line 47 and its surrounding context. It probably doesn't need lines 1–200. Sending the full 300-line file to change one function costs 3x more than sending just that function.

4. Asking Claude to read files it already read

In tool-enabled setups, asking Claude to re-read files it already processed in the current session re-adds those files to the accumulating context. Each re-read is additive.

5. Not starting fresh sessions for new problems

When you finish one task and start another in the same session, you carry all the context from the first task into the second. That context was useful for task 1 but is pure noise for task 2 — noise you're paying for on every turn.

How Much Context Is Wasted?

In a typical Claude Code workflow, a significant portion of input tokens going into each API call are not relevant to the current question.

If you're 30 messages into a session working on a payment module, and you ask a question about your auth system, maybe 20% of the context (recent payment module work) is even tangentially relevant. The other 80% — the earlier exploration, the abandoned approaches, the unrelated fixes — is payload you're carrying but not using.

The Fix: Precision Context

The solution isn't to give Claude less context — it's to give Claude the right context. There's a meaningful difference:

Too little context

"Fix line 47." The model doesn't know enough to make a correct fix.

Too much context

Your entire 30-message session history plus 5 pasted files. The model gets the right answer buried in noise, but you paid for 100,000 tokens.

Precision context

The specific function and its direct dependencies (callers, callees, types). ~8,000 tokens, high relevance. The model has what it needs without the noise.

The key insight: a 65–70% reduction in input tokens doesn't mean giving the model less information. It means giving it the right information, extracted from your codebase graph.

See: How to Reduce Claude Code Token Usage by 58%.

Practical Changes You Can Make Right Now

1. Reset your session for new problems

When you finish a task and start a different one, open a new chat. It takes a few seconds and can cut the token cost of the new task in half by removing accumulated irrelevant history.

2. Be specific about what you paste

Instead of pasting an entire file, paste just the function or class you're working on. Add a brief description, for example:

This is the processPayment function. It calls validateCard (which validates Stripe cards) and chargeStripe (handles API call). I need to add retry logic.

That's ~200 tokens instead of 3,000 tokens for the full file, and the model has the context it needs.

3. Use the `/clear` command strategically

In Claude Code, /clear resets the conversation history without closing the session. Use it when you've finished a sub-task and want to start fresh without starting a completely new session.

4. Watch for file-reading loops

If you notice Claude reads the same file multiple times in a session, that's a sign the context isn't being managed well. Each read adds tokens. Use targeted file references instead of repeated reads.

5. Set token budget alerts

Most API dashboards let you set spending alerts. Set one at your expected monthly budget and another at 1.5x. Alerts surface burning behavior before it becomes a problem, not after.

The Automated Approach

Manual discipline helps, but it's friction. The more sustainable fix is automated context extraction that solves the problem architecturally.

Graph-based context engines index your codebase and extract exactly the relevant code for each query. Instead of session history, the model gets a fresh, precise slice of your codebase every time. No accumulated noise, no stale file dumps.

In benchmarked tests using a FastAPI codebase (7 representative tasks, 21 runs per condition, Claude Sonnet 3.5): automated context extraction achieved ~58% cost reduction with ~63% fewer output tokens and ~22% faster task completion. The reduction came from providing less noise, which allowed the model to produce more precise, shorter answers.

See: Context Engineering for AI Coding Agents.

Understanding Your Claude Bill

Quick reference for Claude Sonnet 3.5 pricing (as of late 2024):

| Component | Price |

|------------------|----------------|

| Input tokens | $3 / 1M tokens |

| Output tokens | $15 / 1M tokens|

| Typical session input | 30,000–80,000 tokens |

| Typical session output | 5,000–15,000 tokens |

| Cost per heavy session | $0.17–$0.47 |

A developer doing 20 heavy sessions per day: $3.40–$9.40/day, or roughly $70–$190/month. That's around where the "this is expensive" threshold sits for many teams.

With precision context, the same developer's input tokens can drop by 65–70%, bringing that to roughly $25–$80/month. Same work, lower bill.

FAQ

Q: Does using Claude Code on the Max plan change this?

The Max plan (e.g., $100/month) includes higher usage limits but uses the same per-token billing through the API. The billing mechanics are identical — context accumulation still drives cost. What changes is the monthly ceiling before you hit rate limits.

Q: Does longer context actually hurt quality?

Often yes, past a certain point. Research on the "lost in the middle" phenomenon shows models give less attention to content buried in the middle of very long contexts. More context isn't always better context.

Q: Is there a way to see exactly what's in Claude's context window?

In Claude Code, you can't directly inspect the full context, but you can estimate it from the conversation length and any files referenced. The API does report input token counts per call.

Q: Should I use Claude Haiku instead of Sonnet to save money?

For simple tasks and exploratory questions, Haiku can be much cheaper at comparable quality. For complex code generation and debugging, Sonnet's quality improvement usually justifies the cost. The bigger win is context efficiency, which reduces cost for any model tier.

Q: What about caching?

Claude supports prompt caching, which can reduce repeated content costs by ~90%. If your system prompt and project context are stable, caching them is high-value. But caching helps with the fixed overhead, not with the variable accumulation of conversation history.

Diagram showing cumulative context growth across multiple Claude Code turns — Context grows cumulatively each turn: even small edits can ride on top of a very large prompt.

precision-context-example.tstypescript

// Example: naive vs precision context for a small change

// ❌ Naive: send entire file every time
// ~300 lines, ~3,000 tokens per turn

// ✅ Precision: send only what matters
interface ChargeResult {
  success: boolean;
  errorCode?: string;
}

async function processPayment(input: PaymentInput): Promise<ChargeResult> {
  const validated = validateCard(input.card);
  if (!validated.ok) {
    return { success: false, errorCode: 'card_invalid' };
  }

  // TODO: add retry logic around chargeStripe
  return chargeStripe(input);
}

// Prompt suggestion:
// "This is processPayment. It calls validateCard (Stripe card validation)
// and chargeStripe (Stripe API call). Add retry logic with exponential
// backoff around chargeStripe, up to 3 attempts."

Frequently Asked Questions

Why does Claude Code use so many tokens per session?

Claude Code sends your entire conversation history plus all referenced files with every API call. This context is cumulative — after 10 turns you might be sending 40,000+ tokens per request even though your actual question is only 200 words. The compounding effect of conversation history is the primary cost driver.

How much does a typical Claude Code session cost?

A heavy Claude Code session using Claude Sonnet costs roughly $0.17–$0.47 per session. A developer doing 20 heavy sessions per day can spend $70–$190/month. With precision context tools, input tokens can drop by 65–70%, bringing costs to roughly $25–$80/month for the same work.

What behaviors burn through Claude Code tokens the fastest?

Five behaviors drive the highest token costs: dumping your codebase at the start of every session, running long exploratory sessions without clearing, sending full files when you only need one function, asking Claude to re-read files it already processed, and not starting fresh sessions for new problems.

Does the Claude Code Max plan fix the token cost problem?

No. The Max plan includes higher usage limits but uses the same per-token billing through the API. Context accumulation still drives cost regardless of your plan. What changes is the monthly ceiling before you hit rate limits, not the underlying cost mechanics.

Does longer context actually hurt Claude Code's output quality?

Often yes. Research on the 'lost in the middle' phenomenon shows models give less attention to content buried in the middle of very long contexts. More context isn't always better context — precision context that includes only what's relevant typically produces better answers at lower cost.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Cost & Optimization

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide

Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

Nicola·May 25, 2026

Context Engineering

Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG

Three approaches to code indexing for AI: embeddings, dependency graphs, and RAG. Each has trade-offs in accuracy, token efficiency, and maintenance cost.

Nicola·May 22, 2026

Context Engineering

RAG for Code: Retrieval-Augmented Generation in AI Development

RAG retrieves relevant code from your codebase before the AI generates a response. But vector-based RAG misses structural relationships that matter for coding.

Nicola·May 21, 2026

Why Claude Code Burns Through Tokens So Fast (And How to Fix It)

The Core Problem: Context Is Cumulative

What's Actually In Your Context

The Specific Behaviors That Burn Tokens Fast

1. Starting every session by dumping your codebase

2. Long exploratory sessions without clearing

3. Iterating with the full file in context

4. Asking Claude to read files it already read

5. Not starting fresh sessions for new problems

How Much Context Is Wasted?

The Fix: Precision Context

Practical Changes You Can Make Right Now

1. Reset your session for new problems

2. Be specific about what you paste

3. Use the /clear command strategically

4. Watch for file-reading loops

5. Set token budget alerts

The Automated Approach

Understanding Your Claude Bill

FAQ

Frequently Asked Questions

Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide

Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG

RAG for Code: Retrieval-Augmented Generation in AI Development

3. Use the `/clear` command strategically