Context Engineering for AI Coding Agents: The Complete Guide

Nicola·March 3, 2026

The first time I realized prompt engineering was the wrong abstraction for AI coding, I was watching Claude load 47 files to answer a question about my auth module. The question was dead simple: “how does JWT validation work in this service?” It had nothing to do with the 40 files Claude pulled in first.

That experience changed how I think about AI coding tools. The problem wasn't the model. It wasn't the prompt. It was the context.

Context engineering for AI coding agents is the practice of controlling what information an agent receives before it generates a response. Not prompts—the instructions you carefully craft. Context—the codebase, history, and state the agent operates on. It's a different layer entirely, and in practice, it matters more than almost everything else.

Why Prompt Engineering Isn't Enough for Coding Agents

Prompt engineering works well for one-shot tasks: summarize this, translate that, classify this input. But coding agents are stateful. They operate on large codebases, across multiple sessions, with dependencies that span dozens of files.

When you ask Claude Code or Cursor to fix a bug in your service layer, the agent doesn't just read your prompt. It reads your entire codebase context—or as much as fits in the window. That's where things break down.

The average production codebase has 50,000–200,000 lines of code. A context window holds maybe 100,000 tokens. A single file can be 1,000+ lines. So the agent makes choices about what to include—and those choices are often wrong.

I've seen Claude include CSS files while debugging a database query. I've seen Cursor load test fixtures when modifying production logic. Not because the models are bad, but because the context selection is dumb. It's just file proximity and recency, not semantic relevance.

Context engineering is the discipline of fixing this. It's about building systems that understand your codebase's structure and serve agents exactly the context they need—nothing more, nothing less.

The Three Pillars of Context Engineering

After building and benchmarking several approaches, I've settled on three things that actually move the needle.

1. Dependency graphs, not file trees

The most important shift is replacing file-based context with dependency-based context. A file tree tells you where code lives. A dependency graph tells you how code connects.

When Claude is editing your auth.service.ts, what it actually needs is:

The interfaces it implements
The modules it imports
The utilities it calls
The callers that depend on it

That's a graph traversal problem, not a file search problem.

Building this graph requires static analysis. Tools like tree-sitter can parse 11+ languages into ASTs (Abstract Syntax Trees), extract every function definition, class, import, and call relationship, and store them in a queryable structure. From there, you run graph algorithms to find the minimal set of code nodes actually relevant to what the agent is doing—the "pivot nodes" that connect the current edit point to the context that matters.

This is fundamentally different from vector search. Embeddings are fuzzy and approximate. Dependency graphs are deterministic. If function A calls function B, that's a fact—not a similarity score.

2. Session memory linked to the code

Here's the thing most people miss: context isn't just about the current query. It's about everything the agent has learned in this session—and in previous sessions.

By default, every session with Claude Code or Cursor starts from zero. The agent has no memory of what it explored last time, what decisions were made, what approaches failed, or what architectural constraints exist. You re-explain the same context every time.

Real context engineering addresses this with session memory that's linked to the code graph. Not just “save these notes”—but observations attached to specific code symbols, automatically flagged as stale when those symbols change.

Example: if you save an observation that “the auth module uses a custom JWT library instead of the standard one,” and someone later modifies auth.service.ts, that observation gets flagged automatically. The agent doesn't act on outdated information.

3. Token budget management

The benchmarks are sobering. On a real FastAPI codebase—7 development tasks, 21 runs per arm, using Claude Sonnet—context engineering:

Reduced token usage by 65–70%
Cut cost by 58%
Dropped output tokens by 63%
Made tasks complete 22% faster

The mechanism is simple: smaller, more relevant context means fewer input tokens. Fewer input tokens means the model has less noise to work through, which means better output with fewer output tokens too.

The counterintuitive truth: giving your AI coding agent less information (but better information) consistently outperforms giving it more.

What Context Engineering Looks Like in Practice

Here's a concrete example. You're editing a FastAPI endpoint that handles user authentication:

auth_router.pypython

@router.post("/login")
async def login(credentials: LoginRequest, db: AsyncSession = Depends(get_db)):
    user = await user_service.authenticate(credentials.email, credentials.password)
    ...

Without context engineering, your agent might load:

The entire routers/ directory (30 files)
All models/ (20 files)
The test suite for the module (15 files)
Random utilities that happened to be recently edited

With dependency-graph context engineering, your agent gets:

user_service.py (direct dependency)
LoginRequest model definition
get_db dependency definition
The interfaces those implement
Any callers that use the same patterns

Total: 6–8 files instead of 65. The agent has everything it needs and nothing it doesn't.

Does Context Engineering Work Across Different AI Coding Tools?

Yes—but the mechanism matters.

If you're building context engineering on top of a specific tool's native features (like Cursor's @codebase or Claude Code's --allowedTools), you're limited to that tool. Switch agents and you start from scratch.

The better approach is to implement context engineering at the MCP layer—Model Context Protocol, the open standard that all major AI coding agents now support.

An MCP server that does dependency graph traversal and session memory works identically with:

Claude Code
Cursor

Frequently Asked Questions

What is the difference between prompt engineering and context engineering?

Prompt engineering focuses on crafting the instructions you give an AI model. Context engineering focuses on what information you provide alongside those instructions. For coding tasks, context engineering is more impactful because a well-written prompt with irrelevant files still produces poor results, while a simple prompt with perfectly targeted code context consistently outperforms.

How much can context engineering reduce token costs?

In benchmarks on production codebases, graph-based context engineering reduces token usage by 58-70% compared to naive file-loading approaches. The exact saving depends on codebase size and structure — larger, more interconnected codebases typically see higher savings because the graph can prune more irrelevant nodes.

Do I need a special tool to implement context engineering?

For basic context engineering (CLAUDE.md files, manual file pinning), no special tooling is needed. However, to achieve 60%+ token reduction automatically across all sessions, you need a tool that builds a dependency graph of your codebase and performs graph-traversal-based context retrieval — this is what vexp provides via its MCP server.

How does a dependency graph differ from a file tree for context selection?

A file tree shows directory structure — it tells you where files are but not how they relate. A dependency graph captures actual import/call relationships between symbols. When you describe a task, graph traversal from the relevant entry points finds only the files and functions that are actually connected to your task, ignoring the rest of the codebase.

Does context engineering work across different AI coding tools?

Yes. Context engineering principles apply to any LLM-based coding agent. Tools like vexp implement context engineering via the Model Context Protocol (MCP), supported by 12+ agents including Claude Code, Cursor, Windsurf, and GitHub Copilot. The same optimized context reduces token usage and improves accuracy regardless of which agent consumes it.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.