Claude API Pricing: Per-Token Costs for Every Model

From $0.80 per million tokens (Haiku) to $15 per million tokens (Opus). Your actual cost depends on model choice, volume, and whether you use prompt caching or the Batch API.

Haiku 3.5

$0.80/MTok

Sonnet 4

$3.00/MTok

Opus 4

$15.00/MTok

Calculate Your Monthly Cost

Claude API Model Pricing

All prices are per 1 million tokens. One token is roughly 4 characters of English text.

ModelInput / 1M tokensOutput / 1M tokens
Claude Opus 4$15.00$75.00
Claude Sonnet 4 popular$3.00$15.00
Claude Haiku 3.5$0.80$4.00

Token estimation guide

1 token ≈ 4 characters
1 token ≈ 0.75 English words
1,000 tokens ≈ 750 words

API Cost Calculator

Estimate your monthly Claude API spend based on real usage patterns.

10100,000

Cost per request

$0.0105

Daily cost

$10.50

Monthly cost

$315.00

Annual projected cost

$3,832.50

Same workload on each model

Claude Opus 4$1,575.00/mo
Claude Sonnet 4$315.00/mo
Claude Haiku 3.5$84.00/mo

Real-World Cost Examples

What common API workloads actually cost per month on each Claude model.

Use caseOpus 4Sonnet 4Haiku 3.5
1,000 customer support responses$2,025.00$405.00$108.00
10,000 document summaries$1,948.05$389.61$103.90
100 code reviews per day$630.00$126.00$33.60
1M RAG queries per month$127,498.73$25,499.75$6,799.93
500 content articles/day$3,600.00$720.00$192.00
5,000 classification tasks/day$1,687.50$337.50$90.00

Monthly costs based on stated daily volumes over 30 days. Does not include prompt caching or batch API discounts.

Prompt Caching: The Biggest Cost Saver

Cache repeated input content and pay only 10% of the normal input token price for cached tokens.

How it works

When part of your API input is identical across requests (system prompts, shared context, few-shot examples), you can mark that section with the cache_control parameter. After the first request, Anthropic caches those tokens. Subsequent requests that include the same cached prefix pay only 10% of the standard input price for those tokens. The cache has a 5-minute TTL that resets with each cache hit.

Break-even example

A chatbot sends a 2,000-token system prompt with every request. At 100,000 requests per month on Sonnet 4:

Without caching

$600.00/mo input cost

2K tokens x 100K requests x $3.00/MTok

With caching (100% cache hit)

$60.00/mo input cost

90% savings on 2K cached tokens

Savings: $540/month on input tokens alone

When to use prompt caching

  • -Any workload with a shared system prompt across requests
  • -Few-shot examples included in every prompt
  • -RAG pipelines where the retrieval context template is consistent
  • -Multi-turn conversations with long conversation history prefixes
  • -Any application making more than a few dozen requests per 5-minute window

Cache hit rates and TTL

The cache TTL is 5 minutes, refreshed on each hit. For applications with steady traffic (more than a few requests every 5 minutes), expect near-100% cache hit rates. Bursty workloads may see lower rates. Monitor your cache hit rate via the response headers (anthropic-cache-creation-input-tokens and anthropic-cache-read-input-tokens).

Batch API: 50% Off for Async Workloads

For workloads that do not need real-time responses, the Batch API offers a flat 50% discount on all token costs.

Ideal use cases

  • +Bulk content generation
  • +Data processing and extraction
  • +Evaluation and testing pipelines
  • +Bulk classification and tagging
  • +Document summarisation at scale
  • +Dataset labelling

Not suitable for

  • xChatbots and conversational AI
  • xReal-time user-facing features
  • xInteractive code assistants
  • xLive customer support

Real-time vs Batch cost comparison

Processing 10,000 documents per month (8K input + 1K output tokens each) on Sonnet 4:

Real-time API

$390.00/mo

Batch API (50% off)

$195.00/mo

Batch requests are processed within a 24-hour window. You submit a JSONL file and retrieve results when processing completes.

Claude API vs OpenAI vs Gemini vs Mistral

Input / output pricing per 1M tokens across major LLM API providers. Prices as of March 2026.

TierClaude (Anthropic)OpenAIGemini (Google)
Top tierOpus 4: $15 / $75GPT-4o: $2.50 / $10Gemini 2.5 Pro: $1.25 / $10
Mid tierSonnet 4: $3 / $15GPT-4o-mini: $0.15 / $0.60Gemini 2.0 Flash: $0.10 / $0.40
Budget tierHaiku 3.5: $0.80 / $4GPT-4.1-nano: $0.10 / $0.40Gemini Flash Lite: $0.075 / $0.30

Same workload, different providers

10,000 customer support responses per month (2K input + 500 output tokens each):

Claude Sonnet 4

$135.00

GPT-4o

$75.00

Gemini 2.5 Pro

$62.50

Mistral Large

$70.00

Claude becomes more competitive when prompt caching is enabled. With 70% cached input, Sonnet 4 drops to approximately $76/mo for this workload.

Note: LLM API pricing changes frequently. These figures are accurate as of March 2026. Always check the official pricing pages for the latest rates.

Cost Optimisation Strategies

Use the cheapest model that works

Start with Haiku for classification and routing. Use Sonnet for most production tasks. Reserve Opus for complex reasoning and research.

Route by complexity

Use Haiku to triage incoming requests and only escalate complex ones to Sonnet or Opus. This can cut costs by 60-80% for mixed workloads.

Enable prompt caching

Cache your system prompt and shared context. If 70% of your input is cacheable, you save roughly 63% on input token costs.

Use the Batch API

For anything that does not need real-time responses (content generation, data processing, evals), the Batch API gives you a flat 50% discount.

Shorten your prompts

Every token costs money. Trim unnecessary instructions, examples, and context. A focused 500-token prompt costs half as much as a 1,000-token one.

Set max_tokens

Always set a max_tokens limit to prevent runaway output. A response that generates 4,000 tokens when you only needed 500 costs 8x more than necessary.

Monitor with response headers

Every API response includes token count headers. Track input_tokens, output_tokens, and cache hit rates to find optimisation opportunities.

Negotiate volume discounts

Enterprise customers spending $10K+ per month can contact Anthropic for custom pricing. Scale-tier accounts also get higher rate limits.

Rate Limits and Throttling

Anthropic applies rate limits based on your account tier. Higher tiers unlock more throughput.

TierRequests/minTokens/minTokens/day
Free520,000300,000
Build (Tier 1)5040,0001,000,000
Build (Tier 2)1,00080,0002,500,000
Build (Tier 3)2,000160,0005,000,000
Build (Tier 4)4,000400,00010,000,000
Scale4,000400,000Custom

Rate limits vary by model. Opus has lower limits than Sonnet and Haiku. Contact Anthropic sales for custom limits on the Scale tier.

Getting Started with the Claude API

1

Create an Anthropic account

Sign up at console.anthropic.com. You will get limited free credits for evaluation.

2

Generate an API key

Navigate to Settings, then API Keys. Create a new key and store it securely. Never commit API keys to source control.

3

Install the SDK

Run pip install anthropic (Python) or npm install @anthropic-ai/sdk (Node.js). The SDK handles authentication, retries, and streaming.

4

Make your first request

Send a messages.create call with your chosen model, a system prompt, and a user message. The response includes token counts so you can track costs immediately.

5

Add prompt caching

Identify the static parts of your prompt (system instructions, few-shot examples) and add cache_control breakpoints. Monitor cache hit rates in response headers.

6

Monitor and optimise

Track your spending in the Anthropic dashboard. Use response headers to monitor per-request token usage. Switch to cheaper models for simpler tasks.

Frequently Asked Questions

How much does the Claude API cost?
Claude API pricing depends on the model you choose. Claude Haiku 3.5 costs $0.80 per million input tokens and $4.00 per million output tokens. Claude Sonnet 4 costs $3.00/$15.00, and Claude Opus 4 costs $15.00/$75.00. You only pay for the tokens you use, with no minimum commitment.
What is the cheapest Claude model?
Claude Haiku 3.5 is the cheapest model at $0.80 per million input tokens and $4.00 per million output tokens. It is ideal for classification, routing, simple question answering, and high-volume tasks where speed and cost matter more than deep reasoning.
How do I estimate my Claude API costs?
Use the calculator above. Enter your model, average input and output tokens per request, and requests per day. A rough rule: 1 token is about 4 characters of English text. A typical customer support response uses about 1,000 input tokens and 500 output tokens.
Is the Claude API cheaper than OpenAI?
It depends on the tier. Claude Sonnet 4 ($3/$15) is more expensive per token than GPT-4o ($2.50/$10), but Anthropic's prompt caching can reduce input costs by up to 90%, making it cheaper for workloads with repeated context. Claude Haiku 3.5 ($0.80/$4) competes with GPT-4o-mini ($0.15/$0.60), though the mini model is cheaper per token.
What is prompt caching and how much does it save?
Prompt caching lets you cache repeated parts of your input (like system prompts or shared context) and pay only 10% of the normal input price for cached tokens. If 70% of your input is cacheable, you save roughly 63% on input costs. A chatbot sending a 2,000-token system prompt with 100,000 requests per month would save around $540 on Sonnet.
What is the Claude Batch API?
The Batch API gives you a 50% discount on both input and output tokens for workloads that do not need real-time responses. Requests are processed within a 24-hour window. It is ideal for content generation, bulk classification, data processing, and evaluation pipelines.
How many tokens are in a typical request?
One token is roughly 4 characters or 0.75 words of English text. A typical customer support interaction uses 500 to 2,000 input tokens and 200 to 1,000 output tokens. A document summarisation request with a full document might use 4,000 to 8,000 input tokens and 500 to 1,500 output tokens.
Does Claude have a free tier?
Anthropic provides limited free API credits for evaluation when you create a new account. Beyond that, you pay per token. There is no permanent free tier for production use. Enterprise customers can negotiate volume discounts.