Using Claude Code with FastAPI: Benchmark-Proven Token Optimization

Using Claude Code with FastAPI: Benchmark-Proven Token Optimization
FastAPI is an ideal stress test for AI coding assistants. It combines rich type annotations, Pydantic models, dependency injection, async I/O, and ORM integration—enough structure to be machine-readable, but enough complexity that naïve context loading quickly becomes expensive.
We benchmarked Claude Code on a real-world FastAPI e‑commerce backend and measured how a graph-aware context engine (vexp) changes both cost and quality. This post walks through the benchmark, the results, and how to apply the same setup to your own FastAPI projects.
FastAPI + vexp Benchmark Summary
Your FastAPI benchmark shows that dependency-graph–driven context selection (via vexp) materially improves AI coding performance on a realistic, typed Python codebase.
Key Setup
- Codebase: FastAPI framework + representative app layer
- Tasks: 7 realistic dev tasks (bugs, features, refactors, docs/tests)
- Runs: 21 per arm (baseline vs. vexp)
- Model: Claude 3.5 Sonnet (API)
- Agent: Claude Code with MCP
Tasks Covered
- Fix validation bug in request body handling
- Add rate limiting middleware to an existing route
- Refactor dependency injection in the auth module
- Add a new endpoint with proper error handling
- Update tests for a modified response schema
- Add OpenAPI docs to undocumented endpoints
- Diagnose a DB-layer performance issue
Each task required coherent edits across ~5–15 files.
Aggregate Results
| Metric | Baseline | With vexp | Change |
|--------|----------|-----------|--------|
| Input tokens (avg) | 84,200 | 29,500 | -65% |
| Total API cost (avg) | $0.38 | $0.16 | -58% |
| Task completion time | 4.2 min | 3.3 min | -22% |
| Task completion rate | 71% | 85% | +14pp |
All 7 tasks and all categories improved on every metric; per-task token reduction ranged ~45–70%.
Why FastAPI Is a Strong Stress Test
FastAPI exposes weaknesses in naive context selection:
- Dependency injection chains (
Depends()): - Long chains:
get_current_user → get_token → get_db → db session config. - Naive agents either miss links or over-include unrelated dependencies.
- Decorator-based routing:
- Routes via
@router.get(...)etc. aren’t discovered by simple import following. - Agents that only track imports miss router–handler connections.
- Pydantic model proliferation:
- Distinct types like
UserResponse,UserCreate,UserUpdate. - Conflation or omission leads to incorrect code.
- Test/source separation:
- Tests in a separate tree.
- Naive context grabbers often pull in tests instead of focusing on source + models + deps.
These patterns create token-wasting traps that the benchmark makes visible.
The Context Problem (Task 3 Example)
Task: Refactor dependency injection in auth.
Naive flow
- Start at
routers/users.py. - Follow
Depends(get_db),Depends(get_current_user). - Read
dependencies/database.py,dependencies/auth.py. - From
auth.py, follow tosecurity.py,models/user.py,schemas/user.py. - From schemas, follow to
models/base.py. - Read adjacent
/routers/files. - Read
/tests/routers/because they’re nearby.
Outcome: ~40+ files, ~84k tokens, when ~8 files are truly needed.
With vexp dependency graph
run_pipelineseeds graph atrouters/users.py.- Traverses imports, relevant dependencies, related schemas.
- Ranks by centrality:
UserService,UserSchema,DatabaseSessionhigh; tests low. - Returns a capsule of ~8 files (~29k tokens).
Same task, same model, ~65% fewer tokens and higher completion rate.
How to Reproduce This in Your FastAPI Project
1. Install vexp CLI
Frequently Asked Questions
Why does Claude Code use so many tokens on FastAPI projects?
How does vexp optimize context for Python and FastAPI codebases?
Can I use vexp with Claude Code on a FastAPI monorepo?
What FastAPI-specific patterns waste the most tokens?
How do I benchmark token savings for my FastAPI project?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Claude Code Rate Limits: Why You Hit Them and How to Stay Under
Hitting Claude Code rate limits? The root cause is usually high tokens per request, not total usage. Here's the math and the fixes.

Cross-Agent Context: How to Share Memory Between Cursor, Claude Code, and Codex
Using Cursor, Claude Code, and Codex? Each tool starts from zero every session. Here's how to build shared context across AI coding agents — and why it matters.

Stale Context in AI Coding: When Yesterday's Knowledge Breaks Today's Code
Stale context causes AI coding bugs that look like hallucinations but aren't. Here's why it happens, why it's getting worse, and how to detect it.