Updated March 2026

Claude Batch API Pricing

A flat 50% discount on all token costs for workloads that do not require real-time responses. Submit requests asynchronously and get results within 24 hours.

Discount

50% Off all tokens

Processing

24h max window

Batch Size

10,000 requests/batch

What Is the Claude Batch API?

The Batch API is an asynchronous processing mode for the Claude API that trades latency for cost. Instead of sending individual requests and receiving immediate responses, you submit a batch of up to 10,000 requests as a JSONL file. Anthropic processes them in the background, typically completing within a few hours, with a guaranteed maximum turnaround of 24 hours.

In exchange for accepting this processing delay, you receive a flat 50% discount on both input and output tokens across all Claude models. The discount applies regardless of model, volume, or token count. There is no minimum batch size -- even a batch of 10 requests gets the full discount.

The workflow is straightforward: create a JSONL file where each line is a standard Messages API request, upload it via the Batch API endpoint, and then either poll for completion or configure a webhook. When processing finishes, you download a results file containing the response for each request, indexed by the custom ID you assigned.

Standard vs Batch Pricing

All prices per 1 million tokens. The Batch API applies a 50% discount to both input and output tokens.

Model	Standard Input	Standard Output	Batch Input	Batch Output
Claude Opus 4	$15.00	$75.00	$7.50	$37.50
Claude Sonnet 4	$3.00	$15.00	$1.50	$7.50
Claude Haiku 3.5	$0.80	$4.00	$0.40	$2.00

Combined savings: You can use prompt caching within batch requests. A cacheable batch request on Sonnet 4 pays $1.50/MTok for standard input, $1.875/MTok for cache writes, and just $0.15/MTok for cache reads. Output remains $7.50/MTok. This combination can reduce total costs by 75% or more compared to standard uncached pricing.

Ideal Use Cases for the Batch API

Perfect for batch processing

+Content generation. Produce blog posts, product descriptions, or marketing copy in bulk. Submit hundreds of generation requests and collect results when ready.
+Data processing and extraction. Parse invoices, extract entities from documents, or transform unstructured data into structured formats at scale.
+Evaluation and testing. Run model evaluations, regression tests, or A/B comparisons across thousands of test cases.
+Dataset labelling. Classify, tag, or annotate training data for ML pipelines. The 50% discount makes large-scale labelling economically viable.
+Document summarisation. Summarise large document collections overnight. Perfect for legal review, academic research, or knowledge base creation.
+Translation. Translate content libraries or localise product catalogues in batch rather than one-at-a-time.

Not suitable for

xReal-time chat. Users expect sub-second responses in conversational interfaces. The batch processing window makes this impossible.
xLow-latency APIs. If your application serves API responses to end users, the 24-hour window is not acceptable.
xInteractive coding assistants. IDE integrations, code completion, and pair programming need immediate responses.
xLive customer support. Support agents and customers need real-time AI assistance, not next-day results.
xStreaming use cases. The Batch API does not support streaming responses. Each request returns a complete response.

Monthly Cost Comparison: Standard vs Batch

Same workloads processed through the standard API versus the Batch API. The model used depends on the task complexity.

Workload	Input/req	Output/req	Standard	Batch	Savings
10K document summaries	8K	1K	$390.00	$195.00	$195.00
50K classification tasks	500	50	$30.00	$15.00	$15.00
5K content articles	1K	3K	$240.00	$120.00	$120.00
100K data extractions	2K	200	$240.00	$120.00	$120.00
1K research analyses	10K	5K	$525.00	$262.50	$262.50

Monthly costs calculated at stated volume. Does not include prompt caching, which would reduce input costs further.

How to Use the Batch API

Prepare your JSONL file

Create a JSONL file where each line is a JSON object containing a custom_id (your reference) and a params object with the standard Messages API parameters: model, max_tokens, messages, and optionally system prompt with cache_control.

Create the batch

Upload the file via POST /v1/messages/batches. You receive a batch ID to track progress. The API validates your file format immediately and returns errors if the structure is wrong.

Monitor progress

Poll GET /v1/messages/batches/{batch_id} to check status, or configure a webhook URL when creating the batch. Status progresses from 'in_progress' to 'ended'. Most batches complete well within the 24-hour window.

Download results

When the batch completes, download the results file. Each line maps your custom_id to the model's response, including token counts for cost tracking. Failed requests include error details.

Combining Batch + Caching for Maximum Savings

The Batch API and prompt caching are independent features that stack multiplicatively. When you use both, the Batch API first applies its 50% discount to all token costs, and then caching reduces the cached input portion by a further 90%. Here is what the combined pricing looks like for Sonnet 4:

Standard Input

$3.00/MTok

Batch Input

$1.50/MTok

Batch + Cache Write

$1.875/MTok

Batch + Cache Read

$0.15/MTok

That $0.15 per million tokens is a 95% reduction from the standard $3.00 rate. For a batch job processing 10,000 documents with a shared 5,000-token system prompt, the cached input alone saves over $140 compared to standard uncached batch pricing.

Frequently Asked Questions

How long does Batch API processing take?

Batch API requests are processed within a 24-hour window. In practice, most batches complete much faster, often within a few hours depending on queue depth and batch size. You submit a JSONL file with your requests and poll for completion or set up a webhook to be notified when results are ready.

Can I combine the Batch API with prompt caching?

Yes. Batch processing and prompt caching stack. The Batch API gives you 50% off all token costs, and prompt caching reduces input costs by up to 90%. For cacheable batch workloads, your effective input cost can be as low as 5% of the standard rate. For example, Sonnet 4 cached batch input costs just $0.15 per million tokens instead of $3.00.

Is there a maximum batch size?

Each batch can contain up to 10,000 requests. If you need to process more, you can submit multiple batches. There is no limit on the number of batches you can have in flight simultaneously, though they are processed based on available capacity.

What happens if some requests in a batch fail?

Batch processing is fault-tolerant. If individual requests fail (e.g., due to content policy violations or malformed input), the rest of the batch continues processing. The results file includes both successful responses and error details for failed requests, so you can identify and retry failures without re-processing the entire batch.

Claude API Pricing Prompt Caching Reduce API Costs Cost Calculator