Claude Code Rate Limits: Why You Hit Them and How to Stay Under

Claude Code Rate Limits: Why You Hit Them and How to Stay Under
Rate limits in Claude Code come in two flavors, and they require different fixes. If you're hitting them, you need to know which one you're dealing with before you can solve it.
The first flavor: API rate limits from Anthropic (requests per minute, tokens per minute, tokens per day). The second flavor: cost-based throttling where your API spend hits a budget cap and requests start failing or slowing.
Both feel similar from the outside — Claude Code stops working, or works slowly, or returns errors. But the root causes and solutions are different.
Understanding Anthropic's Rate Limits
What the Limits Are
Anthropic enforces rate limits on the Claude API at multiple levels:
- Requests per minute (RPM): A cap on how many API calls you can make per minute. For most paid tiers, this is high enough not to be a practical problem for individual developers. For automated workflows or teams, it can become a constraint.
- Tokens per minute (TPM): A cap on the total input + output tokens per minute. This is the limit that most active Claude Code users hit first. At Claude Sonnet pricing, hitting TPM limits means Claude Code is pausing because you're processing tokens too fast.
- Tokens per day (TPD): A daily cap on total tokens. Less commonly hit by individual developers, but relevant for automated workflows or heavy team usage.
The specific numbers depend on your API tier. Anthropic adjusts them based on account usage history and trust level. New accounts start with lower limits; as you demonstrate usage patterns, limits increase automatically.
TL;DR: How to Stop Hitting Claude Code Rate Limits
You’re not just unlucky when you slam into rate limits mid-refactor — you’re burning tokens faster than you realize. Most of that burn comes from:
- Broad, unguided codebase exploration
- Huge, ever-growing session context
- Re-discovering the same code multiple times
- Overly broad
@mentions (entire dirs instead of key files)
The fix is to treat token efficiency as a first-class constraint in your workflow.
1. Make a Context Engine Do the Heavy Lifting
The single highest-impact change: stop letting Claude Code explore your repo blindly.
Use a context engine like vexp so Claude gets only the relevant slices of your codebase instead of scanning everything.
Setup
```bash
npm install -g vexp-cli
cd your-project
vexp-core index --workspace .
```
Then add this to CLAUDE.md:
```markdown
ALWAYS call run_pipeline FIRST before any exploration.
Do NOT use Read, Glob, or grep to search the codebase manually.
```
Effect: ~65% fewer input tokens per task in benchmarks. A 100k-token interaction drops to ~35k, which often turns a 2‑hour limit into a 6‑8‑hour workday.
2. Be Surgical With @ Mentions
Every @ mention is a context load. Mentioning a directory is like saying “read this whole folder.”
Avoid
```text
@src/ fix the authentication flow
Frequently Asked Questions
Why do I keep hitting Claude Code rate limits?
What is the difference between token limits and rate limits in Claude Code?
How can I reduce Claude Code rate limit errors without slowing down?
Do different Claude Code plans have different rate limits?
Does context window size affect rate limits?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Cross-Agent Context: How to Share Memory Between Cursor, Claude Code, and Codex
Using Cursor, Claude Code, and Codex? Each tool starts from zero every session. Here's how to build shared context across AI coding agents — and why it matters.

Using Claude Code with FastAPI: Benchmark-Proven Token Optimization
Benchmark results from 21 runs on a real FastAPI project: 65% fewer input tokens, 57% lower cost, 14pp better task completion. Full methodology and setup guide.

Stale Context in AI Coding: When Yesterday's Knowledge Breaks Today's Code
Stale context causes AI coding bugs that look like hallucinations but aren't. Here's why it happens, why it's getting worse, and how to detect it.