Updated March 2026
Claude Batch API Pricing
A flat 50% discount on all token costs for workloads that do not require real-time responses. Submit requests asynchronously and get results within 24 hours.
Discount
50% Off all tokens
Processing
24h max window
Batch Size
10,000 requests/batch
What Is the Claude Batch API?
The Batch API is an asynchronous processing mode for the Claude API that trades latency for cost. Instead of sending individual requests and receiving immediate responses, you submit a batch of up to 10,000 requests as a JSONL file. Anthropic processes them in the background, typically completing within a few hours, with a guaranteed maximum turnaround of 24 hours.
In exchange for accepting this processing delay, you receive a flat 50% discount on both input and output tokens across all Claude models. The discount applies regardless of model, volume, or token count. There is no minimum batch size -- even a batch of 10 requests gets the full discount.
The workflow is straightforward: create a JSONL file where each line is a standard Messages API request, upload it via the Batch API endpoint, and then either poll for completion or configure a webhook. When processing finishes, you download a results file containing the response for each request, indexed by the custom ID you assigned.
Standard vs Batch Pricing
All prices per 1 million tokens. The Batch API applies a 50% discount to both input and output tokens.
| Model | Standard Input | Standard Output | Batch Input | Batch Output |
|---|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | $7.50 | $37.50 |
| Claude Sonnet 4 | $3.00 | $15.00 | $1.50 | $7.50 |
| Claude Haiku 3.5 | $0.80 | $4.00 | $0.40 | $2.00 |
Combined savings: You can use prompt caching within batch requests. A cacheable batch request on Sonnet 4 pays $1.50/MTok for standard input, $1.875/MTok for cache writes, and just $0.15/MTok for cache reads. Output remains $7.50/MTok. This combination can reduce total costs by 75% or more compared to standard uncached pricing.
Ideal Use Cases for the Batch API
Perfect for batch processing
- +Content generation. Produce blog posts, product descriptions, or marketing copy in bulk. Submit hundreds of generation requests and collect results when ready.
- +Data processing and extraction. Parse invoices, extract entities from documents, or transform unstructured data into structured formats at scale.
- +Evaluation and testing. Run model evaluations, regression tests, or A/B comparisons across thousands of test cases.
- +Dataset labelling. Classify, tag, or annotate training data for ML pipelines. The 50% discount makes large-scale labelling economically viable.
- +Document summarisation. Summarise large document collections overnight. Perfect for legal review, academic research, or knowledge base creation.
- +Translation. Translate content libraries or localise product catalogues in batch rather than one-at-a-time.
Not suitable for
- xReal-time chat. Users expect sub-second responses in conversational interfaces. The batch processing window makes this impossible.
- xLow-latency APIs. If your application serves API responses to end users, the 24-hour window is not acceptable.
- xInteractive coding assistants. IDE integrations, code completion, and pair programming need immediate responses.
- xLive customer support. Support agents and customers need real-time AI assistance, not next-day results.
- xStreaming use cases. The Batch API does not support streaming responses. Each request returns a complete response.
Monthly Cost Comparison: Standard vs Batch
Same workloads processed through the standard API versus the Batch API. The model used depends on the task complexity.
| Workload | Input/req | Output/req | Standard | Batch | Savings |
|---|---|---|---|---|---|
| 10K document summaries | 8K | 1K | $390.00 | $195.00 | $195.00 |
| 50K classification tasks | 500 | 50 | $30.00 | $15.00 | $15.00 |
| 5K content articles | 1K | 3K | $240.00 | $120.00 | $120.00 |
| 100K data extractions | 2K | 200 | $240.00 | $120.00 | $120.00 |
| 1K research analyses | 10K | 5K | $525.00 | $262.50 | $262.50 |
Monthly costs calculated at stated volume. Does not include prompt caching, which would reduce input costs further.
How to Use the Batch API
Prepare your JSONL file
Create a JSONL file where each line is a JSON object containing a custom_id (your reference) and a params object with the standard Messages API parameters: model, max_tokens, messages, and optionally system prompt with cache_control.
Create the batch
Upload the file via POST /v1/messages/batches. You receive a batch ID to track progress. The API validates your file format immediately and returns errors if the structure is wrong.
Monitor progress
Poll GET /v1/messages/batches/{batch_id} to check status, or configure a webhook URL when creating the batch. Status progresses from 'in_progress' to 'ended'. Most batches complete well within the 24-hour window.
Download results
When the batch completes, download the results file. Each line maps your custom_id to the model's response, including token counts for cost tracking. Failed requests include error details.
Combining Batch + Caching for Maximum Savings
The Batch API and prompt caching are independent features that stack multiplicatively. When you use both, the Batch API first applies its 50% discount to all token costs, and then caching reduces the cached input portion by a further 90%. Here is what the combined pricing looks like for Sonnet 4:
Standard Input
$3.00/MTok
Batch Input
$1.50/MTok
Batch + Cache Write
$1.875/MTok
Batch + Cache Read
$0.15/MTok
That $0.15 per million tokens is a 95% reduction from the standard $3.00 rate. For a batch job processing 10,000 documents with a shared 5,000-token system prompt, the cached input alone saves over $140 compared to standard uncached batch pricing.