Using Claude Code with FastAPI: Benchmark-Proven Token Optimization

Nicola·March 26, 2026

Using Claude Code with FastAPI: Benchmark-Proven Token Optimization

FastAPI is an ideal stress test for AI coding assistants. It combines rich type annotations, Pydantic models, dependency injection, async I/O, and ORM integration—enough structure to be machine-readable, but enough complexity that naïve context loading quickly becomes expensive.

We benchmarked Claude Code on a real-world FastAPI e‑commerce backend and measured how a graph-aware context engine (vexp) changes both cost and quality. This post walks through the benchmark, the results, and how to apply the same setup to your own FastAPI projects.

FastAPI + vexp Benchmark Summary

Your FastAPI benchmark shows that dependency-graph–driven context selection (via vexp) materially improves AI coding performance on a realistic, typed Python codebase.

Key Setup

Codebase: FastAPI framework + representative app layer
Tasks: 7 realistic dev tasks (bugs, features, refactors, docs/tests)
Runs: 21 per arm (baseline vs. vexp)
Model: Claude 3.5 Sonnet (API)
Agent: Claude Code with MCP

Tasks Covered

Fix validation bug in request body handling
Add rate limiting middleware to an existing route
Refactor dependency injection in the auth module
Add a new endpoint with proper error handling
Update tests for a modified response schema
Add OpenAPI docs to undocumented endpoints
Diagnose a DB-layer performance issue

Each task required coherent edits across ~5–15 files.

Aggregate Results

|--------|----------|-----------|--------|

| Input tokens (avg) | 84,200 | 29,500 | -65% |

| Total API cost (avg) | $0.38 | $0.16 | -58% |

| Task completion time | 4.2 min | 3.3 min | -22% |

| Task completion rate | 71% | 85% | +14pp |

All 7 tasks and all categories improved on every metric; per-task token reduction ranged ~45–70%.

Why FastAPI Is a Strong Stress Test

FastAPI exposes weaknesses in naive context selection:

Dependency injection chains (Depends()):
Long chains: get_current_user → get_token → get_db → db session config.
Naive agents either miss links or over-include unrelated dependencies.
Decorator-based routing:
Routes via @router.get(...) etc. aren’t discovered by simple import following.
Agents that only track imports miss router–handler connections.
Pydantic model proliferation:
Distinct types like UserResponse, UserCreate, UserUpdate.
Conflation or omission leads to incorrect code.
Test/source separation:
Tests in a separate tree.
Naive context grabbers often pull in tests instead of focusing on source + models + deps.

These patterns create token-wasting traps that the benchmark makes visible.

The Context Problem (Task 3 Example)

Task: Refactor dependency injection in auth.

Naive flow

Start at routers/users.py.
Follow Depends(get_db), Depends(get_current_user).
Read dependencies/database.py, dependencies/auth.py.
From auth.py, follow to security.py, models/user.py, schemas/user.py.
From schemas, follow to models/base.py.
Read adjacent /routers/ files.
Read /tests/routers/ because they’re nearby.

Outcome: ~40+ files, ~84k tokens, when ~8 files are truly needed.

With vexp dependency graph

run_pipeline seeds graph at routers/users.py.
Traverses imports, relevant dependencies, related schemas.
Ranks by centrality: UserService, UserSchema, DatabaseSession high; tests low.
Returns a capsule of ~8 files (~29k tokens).

Same task, same model, ~65% fewer tokens and higher completion rate.

How to Reproduce This in Your FastAPI Project

1. Install vexp CLI

Frequently Asked Questions

Why does Claude Code use so many tokens on FastAPI projects?

FastAPI projects tend to have deeply interconnected modules — routers, models, schemas, and middleware all reference each other. Without targeted context selection, Claude Code loads entire files across these layers, often pulling in 15-30 files when only 3-5 are relevant to your current task.

How does vexp optimize context for Python and FastAPI codebases?

vexp builds a dependency graph that maps actual import and call relationships between Python modules, classes, and functions. When you describe a task, it traverses from the relevant entry points (e.g., a specific router endpoint) and returns only the connected models, schemas, and utilities — typically 58-65% fewer tokens than loading full files.

Can I use vexp with Claude Code on a FastAPI monorepo?

Yes. vexp supports multi-repo workspaces and Python is one of the 11 supported languages. The index is built from your committed repo and shared via a git-native manifest file, so all team members working on the same FastAPI project query the same pre-built dependency graph.

What FastAPI-specific patterns waste the most tokens?

The biggest token wasters are loading full SQLAlchemy model files (often 500+ lines) when you only need one table, pulling entire Pydantic schema modules instead of specific schemas, and loading middleware chains for endpoint-specific questions. Each of these can add 2,000-5,000 unnecessary tokens per file.

How do I benchmark token savings for my FastAPI project?

Run the same set of 10-20 coding tasks (bug fixes, feature additions, refactors) with and without context optimization. Measure input tokens per task using Claude API usage logs. On typical FastAPI projects with 50+ modules, teams see 55-65% token reduction with graph-based context selection versus manual file loading.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.