Claude API Pricing: Per-Token Costs for Every Model
From $0.80 per million tokens (Haiku) to $15 per million tokens (Opus). Your actual cost depends on model choice, volume, and whether you use prompt caching or the Batch API.
Haiku 3.5
$0.80/MTok
Sonnet 4
$3.00/MTok
Opus 4
$15.00/MTok
Claude API Model Pricing
All prices are per 1 million tokens. One token is roughly 4 characters of English text.
| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 |
| Claude Sonnet 4 popular | $3.00 | $15.00 |
| Claude Haiku 3.5 | $0.80 | $4.00 |
Token estimation guide
API Cost Calculator
Estimate your monthly Claude API spend based on real usage patterns.
Cost per request
$0.0105
Daily cost
$10.50
Monthly cost
$315.00
Annual projected cost
$3,832.50
Same workload on each model
Real-World Cost Examples
What common API workloads actually cost per month on each Claude model.
| Use case | Opus 4 | Sonnet 4 | Haiku 3.5 |
|---|---|---|---|
| 1,000 customer support responses | $2,025.00 | $405.00 | $108.00 |
| 10,000 document summaries | $1,948.05 | $389.61 | $103.90 |
| 100 code reviews per day | $630.00 | $126.00 | $33.60 |
| 1M RAG queries per month | $127,498.73 | $25,499.75 | $6,799.93 |
| 500 content articles/day | $3,600.00 | $720.00 | $192.00 |
| 5,000 classification tasks/day | $1,687.50 | $337.50 | $90.00 |
Monthly costs based on stated daily volumes over 30 days. Does not include prompt caching or batch API discounts.
Prompt Caching: The Biggest Cost Saver
Cache repeated input content and pay only 10% of the normal input token price for cached tokens.
How it works
When part of your API input is identical across requests (system prompts, shared context, few-shot examples), you can mark that section with the cache_control parameter. After the first request, Anthropic caches those tokens. Subsequent requests that include the same cached prefix pay only 10% of the standard input price for those tokens. The cache has a 5-minute TTL that resets with each cache hit.
Break-even example
A chatbot sends a 2,000-token system prompt with every request. At 100,000 requests per month on Sonnet 4:
Without caching
$600.00/mo input cost
2K tokens x 100K requests x $3.00/MTok
With caching (100% cache hit)
$60.00/mo input cost
90% savings on 2K cached tokens
Savings: $540/month on input tokens alone
When to use prompt caching
- -Any workload with a shared system prompt across requests
- -Few-shot examples included in every prompt
- -RAG pipelines where the retrieval context template is consistent
- -Multi-turn conversations with long conversation history prefixes
- -Any application making more than a few dozen requests per 5-minute window
Cache hit rates and TTL
The cache TTL is 5 minutes, refreshed on each hit. For applications with steady traffic (more than a few requests every 5 minutes), expect near-100% cache hit rates. Bursty workloads may see lower rates. Monitor your cache hit rate via the response headers (anthropic-cache-creation-input-tokens and anthropic-cache-read-input-tokens).
Batch API: 50% Off for Async Workloads
For workloads that do not need real-time responses, the Batch API offers a flat 50% discount on all token costs.
Ideal use cases
- +Bulk content generation
- +Data processing and extraction
- +Evaluation and testing pipelines
- +Bulk classification and tagging
- +Document summarisation at scale
- +Dataset labelling
Not suitable for
- xChatbots and conversational AI
- xReal-time user-facing features
- xInteractive code assistants
- xLive customer support
Real-time vs Batch cost comparison
Processing 10,000 documents per month (8K input + 1K output tokens each) on Sonnet 4:
Real-time API
$390.00/mo
Batch API (50% off)
$195.00/mo
Batch requests are processed within a 24-hour window. You submit a JSONL file and retrieve results when processing completes.
Claude API vs OpenAI vs Gemini vs Mistral
Input / output pricing per 1M tokens across major LLM API providers. Prices as of March 2026.
| Tier | Claude (Anthropic) | OpenAI | Gemini (Google) |
|---|---|---|---|
| Top tier | Opus 4: $15 / $75 | GPT-4o: $2.50 / $10 | Gemini 2.5 Pro: $1.25 / $10 |
| Mid tier | Sonnet 4: $3 / $15 | GPT-4o-mini: $0.15 / $0.60 | Gemini 2.0 Flash: $0.10 / $0.40 |
| Budget tier | Haiku 3.5: $0.80 / $4 | GPT-4.1-nano: $0.10 / $0.40 | Gemini Flash Lite: $0.075 / $0.30 |
Same workload, different providers
10,000 customer support responses per month (2K input + 500 output tokens each):
Claude Sonnet 4
$135.00
GPT-4o
$75.00
Gemini 2.5 Pro
$62.50
Mistral Large
$70.00
Claude becomes more competitive when prompt caching is enabled. With 70% cached input, Sonnet 4 drops to approximately $76/mo for this workload.
Note: LLM API pricing changes frequently. These figures are accurate as of March 2026. Always check the official pricing pages for the latest rates.
Cost Optimisation Strategies
Use the cheapest model that works
Start with Haiku for classification and routing. Use Sonnet for most production tasks. Reserve Opus for complex reasoning and research.
Route by complexity
Use Haiku to triage incoming requests and only escalate complex ones to Sonnet or Opus. This can cut costs by 60-80% for mixed workloads.
Enable prompt caching
Cache your system prompt and shared context. If 70% of your input is cacheable, you save roughly 63% on input token costs.
Use the Batch API
For anything that does not need real-time responses (content generation, data processing, evals), the Batch API gives you a flat 50% discount.
Shorten your prompts
Every token costs money. Trim unnecessary instructions, examples, and context. A focused 500-token prompt costs half as much as a 1,000-token one.
Set max_tokens
Always set a max_tokens limit to prevent runaway output. A response that generates 4,000 tokens when you only needed 500 costs 8x more than necessary.
Monitor with response headers
Every API response includes token count headers. Track input_tokens, output_tokens, and cache hit rates to find optimisation opportunities.
Negotiate volume discounts
Enterprise customers spending $10K+ per month can contact Anthropic for custom pricing. Scale-tier accounts also get higher rate limits.
Rate Limits and Throttling
Anthropic applies rate limits based on your account tier. Higher tiers unlock more throughput.
| Tier | Requests/min | Tokens/min | Tokens/day |
|---|---|---|---|
| Free | 5 | 20,000 | 300,000 |
| Build (Tier 1) | 50 | 40,000 | 1,000,000 |
| Build (Tier 2) | 1,000 | 80,000 | 2,500,000 |
| Build (Tier 3) | 2,000 | 160,000 | 5,000,000 |
| Build (Tier 4) | 4,000 | 400,000 | 10,000,000 |
| Scale | 4,000 | 400,000 | Custom |
Rate limits vary by model. Opus has lower limits than Sonnet and Haiku. Contact Anthropic sales for custom limits on the Scale tier.
Getting Started with the Claude API
Create an Anthropic account
Sign up at console.anthropic.com. You will get limited free credits for evaluation.
Generate an API key
Navigate to Settings, then API Keys. Create a new key and store it securely. Never commit API keys to source control.
Install the SDK
Run pip install anthropic (Python) or npm install @anthropic-ai/sdk (Node.js). The SDK handles authentication, retries, and streaming.
Make your first request
Send a messages.create call with your chosen model, a system prompt, and a user message. The response includes token counts so you can track costs immediately.
Add prompt caching
Identify the static parts of your prompt (system instructions, few-shot examples) and add cache_control breakpoints. Monitor cache hit rates in response headers.
Monitor and optimise
Track your spending in the Anthropic dashboard. Use response headers to monitor per-request token usage. Switch to cheaper models for simpler tasks.