How to Eliminate Irrelevant Context in AI Coding Sessions

Nicola·March 29, 2026

How to Eliminate Irrelevant Context in AI Coding Sessions

Every AI coding session has two kinds of context: the stuff that helps Claude give better answers, and the stuff that just takes up space. Most developers assume they're loading mostly relevant context. The data says otherwise.

In a typical Claude Code session on a mid-size codebase, 65–80% of input tokens are irrelevant to the current task. That's not a rough estimate — it comes from comparing what context was loaded versus what context was actually referenced in generating the response.

This article breaks down where irrelevant context comes from and exactly how to reduce it.

What Makes Context "Irrelevant"?

Context is irrelevant when it was loaded into the session but didn't influence Claude's response to the current task.

A few common categories:

1. Exploration debris

Files Claude read to understand the codebase during exploration, but which weren't needed for the specific task.

Example: Claude reads config.ts while exploring your project structure, but the task is "fix the pagination bug in UserList.tsx". If config.ts never factors into the fix, it's exploration debris.

2. Cross-task residue

Context from earlier in the session that was relevant to a previous task but not the current one.

Example: Turn 3 was about auth, turn 12 is about the data layer — all the auth files are still in context but contribute nothing to the data layer question.

3. Dependency over-loading

Files loaded because they're imports of imports.

Example: UserList.tsx imports types.ts which imports base-types.ts which imports constants.ts. Claude may load all four to understand types, but the task only needed UserList.tsx and types.ts.

4. Redundant re-reads

The same file loaded multiple times in a session because it appears in multiple import chains.

Example: utils.ts might be in 15 files' import trees; Claude may have loaded it several times.

Where Irrelevant Context Comes From

1. The Exploration Tax (Primary Source)

Every new request in a session triggers exploration. Claude doesn't know exactly what's needed until it reads enough to understand the problem space. This exploration is breadth-first — read the obvious file, then its imports, then those imports.

For a medium-complexity task in a medium-size codebase:

Files directly needed: 3–5
Files loaded for exploration: 10–20
Relevant fraction: 15–35%

The exploration tax is highest for:

Large codebases with deep dependency trees
Ambiguous requests ("fix the auth bug" vs "fix the null check in validateToken in auth/jwt.ts")
Tasks that span multiple subsystems

2. Session Accumulation (Secondary Source)

A session that spans multiple tasks accumulates context from all of them. Each task's exploration debris stays in the context window. By turn 10, you might have 30,000 tokens of context from turns 1–9 that is pure overhead for turn 10.

The accumulation rate depends on:

How many distinct tasks you tackle in a session
How large each task's context footprint is
How much overlap there is between tasks (overlapping context is less wasteful)

For a 3-hour session spanning 5 unrelated features, session accumulation can easily be 50% of total context by the end.

3. Context File Overhead (Tertiary Source)

CLAUDE.md and similar configuration files add tokens to every single request, whether or not their content is relevant.

Example: A 2,000-token CLAUDE.md loaded 50 times in a 50-request session is 100,000 tokens of overhead, most of it irrelevant to most requests.

Measurement: How Much Is Actually Irrelevant?

You can get a rough measure by comparing:

Total input tokens for a request (available in API response or Claude Code logs)
What context Claude explicitly references in its response

For a simple task like "add a null check to getUser in user-service.ts":

Typical total input tokens: 25,000–40,000 (depending on codebase size)
Tokens Claude actually references: 2,000–5,000 (the function, its immediate types, existing error handling patterns)
Relevant fraction: 5–20%

Benchmark data from real-world FastAPI tasks showed that context engineering reduced input tokens by 65% with no loss in task completion quality. This means at least 65% of baseline input tokens were irrelevant — and the actual number was probably higher, since some of the remaining 35% was also marginally relevant at best.

How to Eliminate Irrelevant Context: The Three Layers

Layer 1: Pre-Indexing (Structural Fix)

The only way to structurally eliminate exploration-based irrelevant context is to replace exploration with pre-computed retrieval.

Instead of Claude exploring your codebase breadth-first at request time, a pre-indexed system:

Builds a dependency graph and semantic index of your codebase ahead of time
At request time, queries the graph for the minimum spanning context: the exact set of files and code sections relevant to this specific task
Returns a compressed capsule of 3,000–8,000 tokens instead of 25,000–40,000 tokens of exploration results

This is what vexp's run_pipeline does in practice. The graph-ranked context it returns contains what's needed, not what got picked up during exploration.

Impact: 55–70% reduction in input tokens per request.

Layer 2: Session Hygiene (Behavioral Fix)

For the session accumulation problem, the structural fix is starting new sessions. But within a session, you can reduce accumulation with better hygiene.

Use `/compact` at session midpoints

/compact summarizes earlier conversation, replacing detailed file content with condensed summaries. The summary retains key insights at lower token cost.

Best for long sessions spanning related tasks
Keeps the important decisions and patterns, drops the line-by-line code you've already applied

Use `/clear` between unrelated task groups

Complete one feature area, /clear, start the next.

You lose some familiarity
You eliminate cross-task residue that would otherwise bloat future requests

Scope requests within a session

Once the session has accumulated context for task A, task B requests that explicitly name the relevant files don't need to trigger fresh exploration.

Example: "In the context of what we discussed about auth earlier" keeps the conversation grounded without loading new context.

Impact: 15–30% reduction in session-total input tokens.

Layer 3: Prompt Architecture (Request-Level Fix)

How you phrase requests influences how much exploration they trigger.

Specific > General

General:
"Fix the user authentication bug" → triggers broad auth system exploration
Specific:
"Fix the null check in validateToken in src/auth/jwt.ts line 47" → loads that file and direct dependencies only

Bounded > Unbounded

Unbounded:
"How does the payment system work?" → loads all payment-adjacent files
Bounded:
"What does processPayment return and what error states can it produce?" → loads the function and its error types

Reference existing context

If you've already discussed a module earlier in the session, reference that discussion explicitly.

Example: "Based on the UserService we looked at earlier, how should I structure the audit logging?" This signals Claude to use existing context rather than re-exploring.

Impact: 10–20% reduction in per-request input tokens for well-scoped requests.

Combined: What's Actually Achievable?

| Approach | Irrelevant context eliminated | Effort |

|--------------------------|-------------------------------|--------------|

| Pre-indexing (structural)| 55–70% | Low (once) |

| Session hygiene | 15–30% | Medium |

| Prompt architecture | 10–20% | High |

| All three combined | 65–75% | Medium overall |

The diminishing returns of adding more optimization:

Pre-indexing delivers most of the gain at the lowest ongoing effort
Session hygiene adds meaningful improvement
Prompt architecture is worth doing but requires the most discipline for the smallest marginal gain

For most developers, pre-indexing + basic session hygiene (new sessions for new features) delivers 60–70% of achievable gains with reasonable effort.

The Quality Side Effect

Reducing irrelevant context doesn't just reduce costs — it tends to improve response quality.

This is counterintuitive: more context should mean more information, right?

The problem is that irrelevant context is noise. When Claude has to process 35,000 tokens of context to answer a 3,000-token question, the attention mechanism gets diluted. The model can struggle to weight the relevant 3,000 tokens appropriately when surrounded by 32,000 tokens of tangentially related material.

Compressed, relevant context means cleaner signal. The benchmark data showed 14 percentage points higher task completion rate with context engineering — not just lower tokens, but better answers.

Getting Started

Measure your baseline.

Run 5–10 typical tasks
Record input token counts per request
This tells you the scale of the irrelevant context problem in your workflow

Install vexp.

npm install -g vexp-cli && vexp init in your project root
Configure Claude Code to use run_pipeline before file exploration

Update your session habits.

Start new sessions for genuinely new task areas
Use /compact for long sessions before switching contexts

Frequently Asked Questions

How much of the context in a typical AI coding session is irrelevant?

In a typical Claude Code session on a mid-size codebase, 65-80% of input tokens are irrelevant to the current task. This is measured by comparing what context was loaded versus what was actually referenced in generating the response. Most of those tokens are from full-file loads when only specific functions were needed.

Does irrelevant context actually hurt AI coding accuracy?

Yes. Irrelevant context doesn't just waste tokens — it actively degrades response quality. When an LLM's context window is filled with unrelated code, the model has to attend to more information, increasing the chance it references wrong patterns, suggests incorrect imports, or misidentifies the relevant code path for your task.

What are the main sources of irrelevant context in AI coding sessions?

The three biggest sources are: full-file loads when only a few functions are needed (30-40% of waste), stale conversation history from earlier unrelated tasks (40-50%), and verbose tool outputs like unfiltered test results or git logs. Each source compounds because all context is re-sent on every API call.

How can I tell which context is relevant before loading it?

You often can't manually — that's the core problem. A dependency graph solves this by tracing actual import and call relationships from your task's entry points. Only files and functions connected to your task through the graph are loaded, automatically filtering out everything else in the codebase.

What is the best way to automatically filter irrelevant context?

Use a context engine like vexp that builds a dependency graph of your codebase and performs graph-traversal-based retrieval. When you describe a task, it identifies the relevant entry points, traverses connected symbols, and returns only the necessary code — typically reducing token usage by 58-70% while improving response accuracy.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.