This site is independently operated and is not affiliated with Anthropic. Verify pricing on Anthropic's official website.
Most Popular

Claude Sonnet 4 Pricing

Best balance of intelligence, speed, and cost. The model most developers choose for production workloads.

Input

$3.00

per MTok

Output

$15.00

per MTok

Cache Read

$0.30

per MTok (90% off)

Batch Input

$1.50

per MTok (50% off)

What is Claude Sonnet 4?

Claude Sonnet 4 is Anthropic's mid-tier model, positioned between the budget Haiku and the premium Opus. It delivers strong performance on coding, analysis, content generation, and conversation tasks while keeping costs at 20% of what Opus charges for input and output tokens.

Sonnet handles the vast majority of production use cases. It excels at software engineering tasks (code generation, debugging, code review), structured data extraction, multi-turn conversations, and content creation. For most teams, Sonnet is the right default choice—you should only reach for Opus when Sonnet's quality is measurably insufficient for a specific task.

Context Window

200K tokens

~150,000 words

Max Output

8,192 tokens

Up to 64K with extended

Speed

Fast

Ideal for real-time apps

Real-World Cost Examples

Five common Sonnet 4 workloads with exact token counts and monthly costs.

SaaS chatbot: 1,000 conversations/day

Each conversation averages a 1,500-token prompt (system prompt + history + user message) and a 600-token assistant response.

1,500 input + 600 output × 1,000 req/day × 30 days

Standard

$405.00

per month

With caching

$283.50

per month

Code review tool: 500 PRs/week

Each PR diff is ~4,000 tokens of context. Claude produces a ~2,000-token review with inline suggestions.

4,000 input + 2,000 output × 71 req/day × 30 days

Standard

$89.46

per month

With caching

$66.46

per month

Content generation: 100 blog posts/month

A brief outline prompt (~1,200 tokens) produces a 3,000-word article (~4,000 tokens output).

1,200 input + 4,000 output × 3.3 req/day × 30 days

Standard

$6.35

per month

With caching

$6.03

per month

RAG Q&A over company docs

6,000-token context window (system prompt + retrieved chunks + question). Short 500-token answer.

6,000 input + 500 output × 500 req/day × 30 days

Standard

$382.50

per month

With caching

$139.50

per month

Customer email drafting assistant

Agent reads a customer email (~800 tokens) and drafts a reply (~400 tokens). 200 emails handled per day.

800 input + 400 output × 200 req/day × 30 days

Standard

$50.40

per month

With caching

$37.44

per month

“With caching” assumes 100% of input tokens are cached (best case). Real savings depend on your cache hit rate and what fraction of input is cacheable. Output costs remain the same.

When to Use Sonnet

  • +Production chatbots and customer-facing assistants
  • +Code generation, code review, and debugging
  • +Content creation (articles, emails, marketing copy)
  • +RAG-powered question-answering systems
  • +Data extraction and document summarisation
  • +Multi-turn conversations with context
  • +Any task where you need quality close to Opus at 80% less cost

When NOT to Use Sonnet

  • Simple classification or routing (Haiku is 3.75x cheaper)
  • High-volume tasks where speed matters more than quality
  • Content moderation or spam filtering (Haiku handles this well)
  • Complex multi-step reasoning with high stakes (consider Opus)
  • Legal or scientific analysis requiring maximum accuracy (use Opus)
  • Tasks with a tight budget and tolerance for lower quality (use Haiku)

Sonnet vs Opus Comparison

MetricSonnet 4Opus 4
Input price$3.00/MTok$15.00/MTok
Output price$15.00/MTok$75.00/MTok
Cost multiplier1x (baseline)5x Sonnet
Context window200K tokens200K tokens
SpeedFastSlower
Coding qualityExcellentBest available
Complex reasoningGoodBest available
Best for90% of production tasksHard problems, research

For most teams, Sonnet delivers 95%+ of Opus's quality at 20% of the cost. Reserve Opus for the hardest 5-10% of tasks via model routing.

Sonnet vs Haiku Comparison

MetricSonnet 4Haiku 3.5
Input price$3.00/MTok$0.80/MTok
Output price$15.00/MTok$4.00/MTok
Cost multiplier3.75x Haiku (input)1x (baseline)
Context window200K tokens200K tokens
SpeedFastFastest
QualityHighGood for simple tasks
Best forComplex production tasksClassification, routing, extraction

Use Haiku for simple, high-volume tasks (classification, routing, extraction). Upgrade to Sonnet when the task requires deeper understanding or more nuanced output. See the full Haiku pricing breakdown.

Sonnet 4 Pricing FAQ

How much does Claude Sonnet 4 cost per request?
It depends on request size. A typical chatbot request (1,500 input + 600 output tokens) costs about $0.0135. A code review request (4,000 input + 2,000 output) costs about $0.042. You can use prompt caching to reduce the input portion by up to 90%.
Is Sonnet 4 the same as Claude 3.5 Sonnet?
No. Claude Sonnet 4 is the successor to Claude 3.5 Sonnet. It offers improved coding, reasoning, and instruction-following abilities. The pricing structure changed: Sonnet 4 costs $3/$15 per MTok compared to Claude 3.5 Sonnet's $3/$15 (same price point, better model).
When should I use Sonnet instead of Opus?
Use Sonnet when the task is well-defined and does not require deep multi-step reasoning. Sonnet handles 90%+ of production workloads (chatbots, code generation, content creation, RAG) at 5x lower cost than Opus. Switch to Opus only for complex research, legal analysis, or tasks where Sonnet's quality is measurably insufficient.
Can I combine Sonnet with prompt caching?
Yes. Cached input tokens cost $0.30/MTok instead of $3.00/MTok (90% savings). The initial cache write costs $3.75/MTok (25% premium). If your system prompt is 2,000 tokens and you make 100,000 requests/month, caching saves approximately $540/month on input costs.
What is the maximum output length for Sonnet 4?
Sonnet 4 supports up to 200K tokens of context (input + output combined) and can generate outputs up to 8,192 tokens by default. You can request extended output of up to 64K tokens with the max_tokens parameter for longer-form content.