Updated March 2026

Claude API Rate Limits

Rate limits control how many requests and tokens you can use per minute and per day. Your limits depend on your account tier, which upgrades automatically as your cumulative spend increases.

Rate Limits by Tier

All limits apply per minute (RPM, TPM) or per day (TPD). Limits are shared across models within a tier, with some model-specific adjustments noted below.

Tier	Requests/min	Tokens/min	Tokens/day	How to unlock
Free	5	20,000	300,000	Sign up (no card required)
Build Tier 1	50	40,000	1,000,000	$5 deposit
Build Tier 2	1,000	80,000	2,500,000	$40 cumulative spend
Build Tier 3	2,000	160,000	5,000,000	$200 cumulative spend
Build Tier 4	4,000	400,000	10,000,000	$400 cumulative spend
Scale	4,000+	400,000+	Custom	Contact sales
Enterprise	Custom	Custom	Custom	Enterprise agreement

Limits shown are for Sonnet 4. Opus has lower effective RPM. Haiku has the same RPM but faster processing. Scale and Enterprise limits are customised per account.

How to Check Your Current Tier

Via the Anthropic console

Log in to console.anthropic.com and navigate to Settings > Plans. Your current tier is displayed prominently, along with your cumulative spend and the spend threshold needed for the next tier. The dashboard also shows your current usage against your limits in real time.

Via API response headers

Every API response includes rate limit headers that tell you your current limits and remaining quota. You can read these programmatically to monitor your usage:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    messages=[{"role": "user", "content": "Hello"}]
)

# Access rate limit info from response headers
print(f"Requests remaining: {response.headers.get('anthropic-ratelimit-requests-remaining')}")
print(f"Tokens remaining: {response.headers.get('anthropic-ratelimit-tokens-remaining')}")
print(f"Resets at: {response.headers.get('anthropic-ratelimit-requests-reset')}")

How to Increase Your Rate Limits

Spend more (automatic tier upgrades)

Build tiers upgrade automatically based on cumulative spend. Deposit $5 for Tier 1, spend $40 total for Tier 2, $200 for Tier 3, $400 for Tier 4. No application needed -- upgrades apply within minutes of reaching the threshold.

Apply for Scale tier

If Build Tier 4 limits are not enough, apply for the Scale tier through the Anthropic console. Scale provides the same or higher limits as Tier 4 plus priority access, a dedicated account contact, and the option for custom limits based on your workload.

Enterprise agreement

For the highest limits with guaranteed capacity, contact Anthropic sales. Enterprise agreements include custom RPM, TPM, and TPD limits based on your expected volume, with reserved capacity that is unaffected by platform-wide demand.

Optimise your usage pattern

Sometimes you can stay within limits by optimising rather than upgrading. Use the Batch API for non-real-time work (which has separate, higher limits). Implement request queuing with exponential backoff. Reduce token usage per request with shorter prompts and lower max_tokens settings.

Rate Limit Headers Explained

Every API response includes these headers. Use them to monitor your usage and implement smart retry logic.

Header	Description
anthropic-ratelimit-requests-limit	Maximum requests allowed in the current time window
anthropic-ratelimit-requests-remaining	Requests remaining before you hit the limit
anthropic-ratelimit-requests-reset	ISO 8601 timestamp when the request limit resets
anthropic-ratelimit-tokens-limit	Maximum tokens allowed in the current time window
anthropic-ratelimit-tokens-remaining	Tokens remaining before you hit the limit
anthropic-ratelimit-tokens-reset	ISO 8601 timestamp when the token limit resets
retry-after	Seconds to wait before retrying (only present on 429 responses)

Handling Rate Limit Errors (429 Responses)

What a 429 response means

A 429 status code means you have exceeded one of your rate limits (RPM, TPM, or TPD). The request was not processed and you are not charged. The response includes a retry-after header with the number of seconds to wait before retrying. This is the minimum wait time -- retrying sooner will result in another 429.

Recommended retry strategy

Implement exponential backoff with jitter. This prevents all your retries from hitting the API at the same instant when the rate limit window resets:

import anthropic
import time
import random

client = anthropic.Anthropic()

def call_with_retry(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=messages
            )
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Exponential backoff with jitter
            wait = min(2 ** attempt + random.random(), 60)
            retry_after = e.response.headers.get("retry-after")
            if retry_after:
                wait = max(wait, float(retry_after))
            time.sleep(wait)

The Anthropic Python and Node.js SDKs include built-in retry logic with exponential backoff. If you use the official SDK, basic retry handling works out of the box. Custom retry logic is only needed if you want finer control over backoff timing or retry limits.

Proactive rate limit management

Instead of waiting for 429s, track the requests-remaining and tokens-remaining headers. When remaining capacity drops below 20%, start throttling your request rate. This avoids 429s entirely and provides a smoother experience.

Request queuing

For high-volume applications, implement a request queue with a token bucket rate limiter. Queue incoming requests and drain them at a rate just below your tier limit. This smooths out burst traffic and prevents 429 errors while maximising throughput.

Model-Specific Rate Limit Notes

While rate limit tiers apply to your account, different models have different effective throughput due to their size and processing speed.

Claude Opus 4

Lower RPM than listed tier limits

Opus requires more compute per request due to its size and reasoning depth. Expect roughly 50-75% of your tier's listed RPM.

Claude Sonnet 4

Standard tier limits apply

Sonnet is the baseline model for rate limit tiers. The limits in the table above apply directly to Sonnet requests.

Claude Haiku 3.5

Higher effective throughput

Haiku processes requests faster due to its smaller size. While the RPM limit is the same, you are less likely to hit TPM limits because responses are generated more quickly.

Enterprise Rate Limits

Enterprise customers receive custom rate limits negotiated as part of their agreement. These are not bound by the standard tier system and can be set to any level based on your requirements and Anthropic's capacity planning.

Reserved capacity

Enterprise limits are backed by reserved compute capacity, meaning your throughput is guaranteed regardless of overall platform demand.

Burst handling

Enterprise plans can include burst allowances that let you temporarily exceed your sustained rate limit for short periods without hitting 429 errors.

Per-model limits

Custom limits can be set per model. For example, high RPM for Haiku classification and moderate RPM for Opus reasoning, each with independent quotas.

Priority queuing

Enterprise requests receive priority in Anthropic's processing queue, resulting in lower latency even during peak platform usage.

Contact Anthropic sales to discuss enterprise rate limits for your workload.

Frequently Asked Questions

What happens when I hit a rate limit?

The API returns a 429 Too Many Requests response with a retry-after header indicating how many seconds to wait. Your request is not processed and you are not charged. Implement exponential backoff with jitter in your retry logic: wait the retry-after duration, then retry. If you consistently hit limits, consider upgrading your tier or optimising your request patterns.

How do I check my current rate limit tier?

Log in to console.anthropic.com and navigate to Settings > Plans. Your current tier is displayed along with your cumulative spend. You can also check your limits programmatically by reading the rate limit headers on any API response, which show your current limits and remaining quota.

How quickly do rate limit tiers upgrade?

Tier upgrades happen automatically as your cumulative spend reaches the threshold. Build Tier 1 requires a $5 deposit. Tier 2 unlocks at $40 cumulative spend, Tier 3 at $200, and Tier 4 at $400. Upgrades are applied within minutes of reaching the threshold. There is no application process for Build tiers.

Can I get higher limits than Tier 4 without an enterprise agreement?

Yes. The Scale tier provides limits equal to or higher than Build Tier 4, plus priority access and a dedicated account contact. You can apply for Scale tier through the Anthropic console. For limits beyond Scale, you need an enterprise agreement with custom terms negotiated with Anthropic's sales team.

Claude API Pricing Free Tier Enterprise Cost Calculator