Claude Sonnet 4 Pricing
Best balance of intelligence, speed, and cost. The model most developers choose for production workloads.
Input
$3.00
per MTok
Output
$15.00
per MTok
Cache Read
$0.30
per MTok (90% off)
Batch Input
$1.50
per MTok (50% off)
What is Claude Sonnet 4?
Claude Sonnet 4 is Anthropic's mid-tier model, positioned between the budget Haiku and the premium Opus. It delivers strong performance on coding, analysis, content generation, and conversation tasks while keeping costs at 20% of what Opus charges for input and output tokens.
Sonnet handles the vast majority of production use cases. It excels at software engineering tasks (code generation, debugging, code review), structured data extraction, multi-turn conversations, and content creation. For most teams, Sonnet is the right default choice—you should only reach for Opus when Sonnet's quality is measurably insufficient for a specific task.
Context Window
200K tokens
~150,000 words
Max Output
8,192 tokens
Up to 64K with extended
Speed
Fast
Ideal for real-time apps
Real-World Cost Examples
Five common Sonnet 4 workloads with exact token counts and monthly costs.
SaaS chatbot: 1,000 conversations/day
Each conversation averages a 1,500-token prompt (system prompt + history + user message) and a 600-token assistant response.
1,500 input + 600 output × 1,000 req/day × 30 days
Standard
$405.00
per month
With caching
$283.50
per month
Code review tool: 500 PRs/week
Each PR diff is ~4,000 tokens of context. Claude produces a ~2,000-token review with inline suggestions.
4,000 input + 2,000 output × 71 req/day × 30 days
Standard
$89.46
per month
With caching
$66.46
per month
Content generation: 100 blog posts/month
A brief outline prompt (~1,200 tokens) produces a 3,000-word article (~4,000 tokens output).
1,200 input + 4,000 output × 3.3 req/day × 30 days
Standard
$6.35
per month
With caching
$6.03
per month
RAG Q&A over company docs
6,000-token context window (system prompt + retrieved chunks + question). Short 500-token answer.
6,000 input + 500 output × 500 req/day × 30 days
Standard
$382.50
per month
With caching
$139.50
per month
Customer email drafting assistant
Agent reads a customer email (~800 tokens) and drafts a reply (~400 tokens). 200 emails handled per day.
800 input + 400 output × 200 req/day × 30 days
Standard
$50.40
per month
With caching
$37.44
per month
“With caching” assumes 100% of input tokens are cached (best case). Real savings depend on your cache hit rate and what fraction of input is cacheable. Output costs remain the same.
When to Use Sonnet
- +Production chatbots and customer-facing assistants
- +Code generation, code review, and debugging
- +Content creation (articles, emails, marketing copy)
- +RAG-powered question-answering systems
- +Data extraction and document summarisation
- +Multi-turn conversations with context
- +Any task where you need quality close to Opus at 80% less cost
When NOT to Use Sonnet
- −Simple classification or routing (Haiku is 3.75x cheaper)
- −High-volume tasks where speed matters more than quality
- −Content moderation or spam filtering (Haiku handles this well)
- −Complex multi-step reasoning with high stakes (consider Opus)
- −Legal or scientific analysis requiring maximum accuracy (use Opus)
- −Tasks with a tight budget and tolerance for lower quality (use Haiku)
Sonnet vs Opus Comparison
| Metric | Sonnet 4 | Opus 4 |
|---|---|---|
| Input price | $3.00/MTok | $15.00/MTok |
| Output price | $15.00/MTok | $75.00/MTok |
| Cost multiplier | 1x (baseline) | 5x Sonnet |
| Context window | 200K tokens | 200K tokens |
| Speed | Fast | Slower |
| Coding quality | Excellent | Best available |
| Complex reasoning | Good | Best available |
| Best for | 90% of production tasks | Hard problems, research |
For most teams, Sonnet delivers 95%+ of Opus's quality at 20% of the cost. Reserve Opus for the hardest 5-10% of tasks via model routing.
Sonnet vs Haiku Comparison
| Metric | Sonnet 4 | Haiku 3.5 |
|---|---|---|
| Input price | $3.00/MTok | $0.80/MTok |
| Output price | $15.00/MTok | $4.00/MTok |
| Cost multiplier | 3.75x Haiku (input) | 1x (baseline) |
| Context window | 200K tokens | 200K tokens |
| Speed | Fast | Fastest |
| Quality | High | Good for simple tasks |
| Best for | Complex production tasks | Classification, routing, extraction |
Use Haiku for simple, high-volume tasks (classification, routing, extraction). Upgrade to Sonnet when the task requires deeper understanding or more nuanced output. See the full Haiku pricing breakdown.