'Claude Code Spending Too Much' — Fixing the #1 Developer Complaint

Nicola·
'Claude Code Spending Too Much' — Fixing the #1 Developer Complaint

"Claude Code Spending Too Much" — Fixing the #1 Developer Complaint

"Why is Claude Code so expensive?" For most teams, the answer isn’t a bug in Claude Code — it’s unmanaged context.

When you let conversation history grow unchecked, paste full files instead of relevant snippets, and run verbose tools, your token usage explodes. The good news: this is fixable, and the savings are predictable.

This guide explains:

  • Why Claude Code sessions get expensive
  • The five behaviors that spike your bill
  • How much each fix actually saves (with numbers)
  • How to automate context management with vexp
  • When vexp is worth it vs. behavior changes alone

Why Claude Code Gets Expensive: The Token Breakdown

Every Claude API call is priced on tokens in (input/context) and tokens out (output/response). Claude Code makes many calls per session, and most of the cost comes from what you load into context.

In a typical unoptimized session, token usage breaks down roughly like this:

Conversation history: 40–50%

Every message you send and every response you get stays in the context window. A long debugging session with ~30 back-and-forth messages can accumulate 40,000+ tokens of conversation alone, before you even load a file. As the session continues, this history compounds.

File contents: 30–40%

When Claude reads files, the entire file is usually added to context. A typical backend file (200–400 lines) is about 1,500–3,000 tokens. Load 10 such files and you’ve spent 15,000–30,000 tokens just on file content — much of which may be irrelevant to your current question.

System context: 10–15%

Things like CLAUDE.md, project instructions, and MCP server metadata usually add 2,000–5,000 tokens per session.

Tool call results: 5–10%

Outputs from commands (npm test, git log, search tools, etc.) can be huge if you run them verbosely.

The key insight: conversation history + file contents usually account for 70–90% of your most expensive sessions — and both are controllable.

The Five Behaviors That Spike Your Bill

1. Long, Wandering Sessions

Using Claude Code as a single, never-ending chat thread is the fastest way to overspend.

When you:

  • Debug three unrelated bugs
  • Explore a new feature
  • Paste logs and files from multiple services

…all in one session, the entire history stays in context. None of it is pruned automatically.

A developer who runs one 3-hour session will typically spend 3–5× more tokens than a developer who runs three focused 1-hour sessions for the same total work.

Fix: One session per task.

Start a fresh session for each distinct bug, feature, or investigation.

2. Loading Full Files When You Only Need Functions

Pasting or loading entire files like:

"Can you look at this file?" (400+ lines)

…is expensive when you only need help with a 30-line function.

Once loaded, that full file:

  • Stays in context for the rest of the session
  • Gets re-sent on every subsequent API call

Fix:

  • Use tools that extract only relevant code (functions, snippets, call sites)
  • Or, at minimum, paste only the relevant function or block, not the whole file

3. Repeated Context Re-Loading

Another common pattern:

  1. Start a new session
  2. Paste the same architecture notes, conventions, and key files you pasted yesterday

You’re paying again for static context that hasn’t changed.

Fix:

  • Put static project context in CLAUDE.md so it loads automatically
  • Use session memory for dynamic but reusable context (e.g., current feature spec)

This way, you don’t have to keep re-pasting the same information.

4. Verbose Tool Calls

Commands like:

  • npm test when 200 tests are failing
  • git log --all --full-diff

…can easily dump 10,000+ tokens of output into your context.

Claude Code faithfully includes that output in subsequent calls, inflating your token usage.

Fix: Prefer targeted, quiet commands:

  • npm test -- --testNamePattern="auth"
  • git log -n 5
  • Use --quiet / -q flags where available
  • Pipe exploratory commands to head -n 50

Claude Code respects output length. If you keep outputs small, your context stays lean.

5. Speculative File Loading

It’s natural to think:

"Let me load all the relevant files first, just in case."

But pre-loading multiple large files before you know what’s actually needed is one of the most expensive habits.

Fix: Load context on demand:

  • Let Claude ask for specific files or functions as needed
  • Or use a context engine (like vexp) that automatically loads only the most relevant snippets within a fixed token budget

The Numbers: What Each Fix Actually Saves

Here’s a rough comparison of typical token costs per session, before and after applying each fix:

| Behavior | Typical token cost | With fix | Savings |

|------------------------|--------------------|----------|---------|

| Long sessions (3hr+) | 80,000 tokens | 25,000 | ~69% |

| Full file loading | 20,000 tokens | 4,000 | ~80% |

| Repeated re-loading | 8,000 tokens | 1,000 | ~88% |

| Verbose tool calls | 15,000 tokens | 3,000 | ~80% |

| Speculative loading | 12,000 tokens | 2,500 | ~79% |

These savings overlap, so you can’t just add them up. In practice, disciplined session and context management typically yields a 30–40% reduction in total token usage.

The Automated Fix: Context Engineering

Manual discipline caps out around 30–40% savings because:

  • Long sessions are sometimes convenient
  • You often don’t know which files matter up front
  • Under time pressure, it’s easy to paste whole files or run verbose commands

A context engine automates the hard part: selecting and compressing only the most relevant context into a tight token budget.

How a Context Engine Works

When you describe a task, a context engine:

  1. Searches your codebase (e.g., via embeddings + graph ranking)
  2. Identifies the most relevant files, functions, and call sites
  3. Extracts only the necessary snippets
  4. Compresses them into a small, ranked context capsule

Frequently Asked Questions

Why does Claude Code cost so much per session?
Most Claude Code cost comes from input tokens — conversation history (40-50%) and file contents (30-40%) make up 70-90% of expensive sessions. As sessions grow longer and more files are loaded, every subsequent API call re-sends all that accumulated context, compounding the cost.
What is the fastest way to reduce Claude Code token spending?
Start a fresh session for each distinct task instead of running long, multi-topic sessions. A single 3-hour session typically costs 3-5x more tokens than three focused 1-hour sessions for the same work. Combined with loading only relevant code snippets instead of full files, this alone can cut costs by 30-40%.
How much can a context engine save on Claude Code costs?
A context engine like vexp automates context selection using dependency graph traversal, typically reducing input tokens by 58-65% compared to manual file loading. It searches your codebase, identifies only the relevant functions and call sites, and compresses them into a tight token budget — going beyond what manual discipline alone can achieve.
Does conversation history really affect Claude Code costs?
Yes, significantly. Conversation history alone accounts for 40-50% of token usage in typical sessions. Every message you send and every response stays in the context window and gets re-sent on every subsequent API call. A 30-message debugging session can accumulate 40,000+ tokens of conversation before any files are loaded.
Can I reduce Claude Code costs without changing my workflow?
Manual workflow changes (shorter sessions, smaller file snippets, quieter commands) cap out around 30-40% savings. To achieve 58-65% reduction automatically, you need a context engine that pre-indexes your codebase and serves only relevant code per task. Tools like vexp handle this via MCP integration, requiring no manual context management.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Related Articles