Claude Code Rate Limits: Why You Hit Them and How to Stay Under

Nicola·
Claude Code Rate Limits: Why You Hit Them and How to Stay Under

Claude Code Rate Limits: Why You Hit Them and How to Stay Under

Rate limits in Claude Code come in two flavors, and they require different fixes. If you're hitting them, you need to know which one you're dealing with before you can solve it.

The first flavor: API rate limits from Anthropic (requests per minute, tokens per minute, tokens per day). The second flavor: cost-based throttling where your API spend hits a budget cap and requests start failing or slowing.

Both feel similar from the outside — Claude Code stops working, or works slowly, or returns errors. But the root causes and solutions are different.

Understanding Anthropic's Rate Limits

What the Limits Are

Anthropic enforces rate limits on the Claude API at multiple levels:

  • Requests per minute (RPM): A cap on how many API calls you can make per minute. For most paid tiers, this is high enough not to be a practical problem for individual developers. For automated workflows or teams, it can become a constraint.
  • Tokens per minute (TPM): A cap on the total input + output tokens per minute. This is the limit that most active Claude Code users hit first. At Claude Sonnet pricing, hitting TPM limits means Claude Code is pausing because you're processing tokens too fast.
  • Tokens per day (TPD): A daily cap on total tokens. Less commonly hit by individual developers, but relevant for automated workflows or heavy team usage.

The specific numbers depend on your API tier. Anthropic adjusts them based on account usage history and trust level. New accounts start with lower limits; as you demonstrate usage patterns, limits increase automatically.

TL;DR: How to Stop Hitting Claude Code Rate Limits

You’re not just unlucky when you slam into rate limits mid-refactor — you’re burning tokens faster than you realize. Most of that burn comes from:

  • Broad, unguided codebase exploration
  • Huge, ever-growing session context
  • Re-discovering the same code multiple times
  • Overly broad @ mentions (entire dirs instead of key files)

The fix is to treat token efficiency as a first-class constraint in your workflow.

1. Make a Context Engine Do the Heavy Lifting

The single highest-impact change: stop letting Claude Code explore your repo blindly.

Use a context engine like vexp so Claude gets only the relevant slices of your codebase instead of scanning everything.

Setup

```bash

npm install -g vexp-cli

cd your-project

vexp-core index --workspace .

```

Then add this to CLAUDE.md:

```markdown

ALWAYS call run_pipeline FIRST before any exploration.

Do NOT use Read, Glob, or grep to search the codebase manually.

```

Effect: ~65% fewer input tokens per task in benchmarks. A 100k-token interaction drops to ~35k, which often turns a 2‑hour limit into a 6‑8‑hour workday.

2. Be Surgical With @ Mentions

Every @ mention is a context load. Mentioning a directory is like saying “read this whole folder.”

Avoid

```text

@src/ fix the authentication flow

Frequently Asked Questions

Why do I keep hitting Claude Code rate limits?
Rate limits are triggered by sending too many tokens per minute or too many requests per minute. Long sessions with large context windows are the primary cause — each API call re-sends the full conversation history plus loaded files, quickly exhausting your per-minute token allowance.
What is the difference between token limits and rate limits in Claude Code?
Token limits cap the total context window size per request (e.g., 200K tokens). Rate limits cap how many tokens or requests you can send per minute across all requests. You can hit rate limits even with small individual requests if you make them frequently enough, or with fewer large requests that send massive context.
How can I reduce Claude Code rate limit errors without slowing down?
The most effective approach is reducing input tokens per request. Use shorter sessions (one task per session), load only relevant code snippets instead of full files, and use quiet command flags. A context engine like vexp can reduce input tokens by 58-65% automatically, letting you stay under rate limits while maintaining your work pace.
Do different Claude Code plans have different rate limits?
Yes. API rate limits vary by plan tier and model. Higher-tier plans get higher tokens-per-minute and requests-per-minute allowances. However, optimizing token usage is more cost-effective than upgrading plans — reducing tokens by 60% effectively triples your effective rate limit capacity at no additional cost.
Does context window size affect rate limits?
Yes, directly. Every token in your context window counts toward your per-minute token rate limit. A session with 100K tokens of context uses 10x more of your rate limit budget per request than a session with 10K tokens. Keeping context lean is the single most effective way to avoid rate limit errors.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Related Articles