Updated March 2026
Claude API Rate Limits
Rate limits control how many requests and tokens you can use per minute and per day. Your limits depend on your account tier, which upgrades automatically as your cumulative spend increases.
Rate Limits by Tier
All limits apply per minute (RPM, TPM) or per day (TPD). Limits are shared across models within a tier, with some model-specific adjustments noted below.
| Tier | Requests/min | Tokens/min | Tokens/day | How to unlock |
|---|---|---|---|---|
| Free | 5 | 20,000 | 300,000 | Sign up (no card required) |
| Build Tier 1 | 50 | 40,000 | 1,000,000 | $5 deposit |
| Build Tier 2 | 1,000 | 80,000 | 2,500,000 | $40 cumulative spend |
| Build Tier 3 | 2,000 | 160,000 | 5,000,000 | $200 cumulative spend |
| Build Tier 4 | 4,000 | 400,000 | 10,000,000 | $400 cumulative spend |
| Scale | 4,000+ | 400,000+ | Custom | Contact sales |
| Enterprise | Custom | Custom | Custom | Enterprise agreement |
Limits shown are for Sonnet 4. Opus has lower effective RPM. Haiku has the same RPM but faster processing. Scale and Enterprise limits are customised per account.
How to Check Your Current Tier
Via the Anthropic console
Log in to console.anthropic.com and navigate to Settings > Plans. Your current tier is displayed prominently, along with your cumulative spend and the spend threshold needed for the next tier. The dashboard also shows your current usage against your limits in real time.
Via API response headers
Every API response includes rate limit headers that tell you your current limits and remaining quota. You can read these programmatically to monitor your usage:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=100,
messages=[{"role": "user", "content": "Hello"}]
)
# Access rate limit info from response headers
print(f"Requests remaining: {response.headers.get('anthropic-ratelimit-requests-remaining')}")
print(f"Tokens remaining: {response.headers.get('anthropic-ratelimit-tokens-remaining')}")
print(f"Resets at: {response.headers.get('anthropic-ratelimit-requests-reset')}")How to Increase Your Rate Limits
Spend more (automatic tier upgrades)
Build tiers upgrade automatically based on cumulative spend. Deposit $5 for Tier 1, spend $40 total for Tier 2, $200 for Tier 3, $400 for Tier 4. No application needed -- upgrades apply within minutes of reaching the threshold.
Apply for Scale tier
If Build Tier 4 limits are not enough, apply for the Scale tier through the Anthropic console. Scale provides the same or higher limits as Tier 4 plus priority access, a dedicated account contact, and the option for custom limits based on your workload.
Enterprise agreement
For the highest limits with guaranteed capacity, contact Anthropic sales. Enterprise agreements include custom RPM, TPM, and TPD limits based on your expected volume, with reserved capacity that is unaffected by platform-wide demand.
Optimise your usage pattern
Sometimes you can stay within limits by optimising rather than upgrading. Use the Batch API for non-real-time work (which has separate, higher limits). Implement request queuing with exponential backoff. Reduce token usage per request with shorter prompts and lower max_tokens settings.
Rate Limit Headers Explained
Every API response includes these headers. Use them to monitor your usage and implement smart retry logic.
| Header | Description |
|---|---|
| anthropic-ratelimit-requests-limit | Maximum requests allowed in the current time window |
| anthropic-ratelimit-requests-remaining | Requests remaining before you hit the limit |
| anthropic-ratelimit-requests-reset | ISO 8601 timestamp when the request limit resets |
| anthropic-ratelimit-tokens-limit | Maximum tokens allowed in the current time window |
| anthropic-ratelimit-tokens-remaining | Tokens remaining before you hit the limit |
| anthropic-ratelimit-tokens-reset | ISO 8601 timestamp when the token limit resets |
| retry-after | Seconds to wait before retrying (only present on 429 responses) |
Handling Rate Limit Errors (429 Responses)
What a 429 response means
A 429 status code means you have exceeded one of your rate limits (RPM, TPM, or TPD). The request was not processed and you are not charged. The response includes a retry-after header with the number of seconds to wait before retrying. This is the minimum wait time -- retrying sooner will result in another 429.
Recommended retry strategy
Implement exponential backoff with jitter. This prevents all your retries from hitting the API at the same instant when the rate limit window resets:
import anthropic
import time
import random
client = anthropic.Anthropic()
def call_with_retry(messages, max_retries=5):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages
)
except anthropic.RateLimitError as e:
if attempt == max_retries - 1:
raise
# Exponential backoff with jitter
wait = min(2 ** attempt + random.random(), 60)
retry_after = e.response.headers.get("retry-after")
if retry_after:
wait = max(wait, float(retry_after))
time.sleep(wait)The Anthropic Python and Node.js SDKs include built-in retry logic with exponential backoff. If you use the official SDK, basic retry handling works out of the box. Custom retry logic is only needed if you want finer control over backoff timing or retry limits.
Proactive rate limit management
Instead of waiting for 429s, track the requests-remaining and tokens-remaining headers. When remaining capacity drops below 20%, start throttling your request rate. This avoids 429s entirely and provides a smoother experience.
Request queuing
For high-volume applications, implement a request queue with a token bucket rate limiter. Queue incoming requests and drain them at a rate just below your tier limit. This smooths out burst traffic and prevents 429 errors while maximising throughput.
Model-Specific Rate Limit Notes
While rate limit tiers apply to your account, different models have different effective throughput due to their size and processing speed.
Claude Opus 4
Lower RPM than listed tier limitsOpus requires more compute per request due to its size and reasoning depth. Expect roughly 50-75% of your tier's listed RPM.
Claude Sonnet 4
Standard tier limits applySonnet is the baseline model for rate limit tiers. The limits in the table above apply directly to Sonnet requests.
Claude Haiku 3.5
Higher effective throughputHaiku processes requests faster due to its smaller size. While the RPM limit is the same, you are less likely to hit TPM limits because responses are generated more quickly.
Enterprise Rate Limits
Enterprise customers receive custom rate limits negotiated as part of their agreement. These are not bound by the standard tier system and can be set to any level based on your requirements and Anthropic's capacity planning.
Reserved capacity
Enterprise limits are backed by reserved compute capacity, meaning your throughput is guaranteed regardless of overall platform demand.
Burst handling
Enterprise plans can include burst allowances that let you temporarily exceed your sustained rate limit for short periods without hitting 429 errors.
Per-model limits
Custom limits can be set per model. For example, high RPM for Haiku classification and moderate RPM for Opus reasoning, each with independent quotas.
Priority queuing
Enterprise requests receive priority in Anthropic's processing queue, resulting in lower latency even during peak platform usage.
Contact Anthropic sales to discuss enterprise rate limits for your workload.