Batch API: Reduce Bandwidth Waste and Improve API Efficiency

Developers often struggle with slow response times and high network costs when sending thousands of separate API calls. The Batch API addresses this by combining multiple independent requests into one operation, reducing latency, bandwidth usage, and connection overhead.

This article explains what Batch API is, how it differs from standard APIs, and how Novita AI’s Batch API enables large-scale asynchronous inference through structured JSONL input, efficient file handling, and reliable error tracking. It also outlines key efficiency factors such as cost, latency, and throughput, and provides a concise guide to implementation and monitoring.

Try Batch Inference at 50% Off

Run large-scale predictions faster and cheaper with supported models:
qwen/qwen3-vl-235b-a22b-instruct, openai/gpt-oss-120b, deepseek/deepseek-r1-0528, qwen/qwen3-4b-fp8.

Try Batch Inference at 50% Off

Run large-scale predictions faster and cheaper with supported models:
qwen/qwen3-vl-235b-a22b-instruct, openai/gpt-oss-120b, deepseek/deepseek-r1-0528, qwen/qwen3-4b-f

Table Of Contents

What is Batch API？
Key Factors Affecting Batch API Efficiency
How to Use Batch API？
How Should Failed Requests be Handled in a Batch API?
What Endpoints Does Novita AI Provide for Batch API Operations?

What is Batch API？

Batch API: Combines multiple independent API calls (e.g., GET, POST, PUT, DELETE) into a single HTTP request.
Response Structure: The server returns the results of all sub-requests in order, indicating success or failure for each.

Key Difference Between Batch API and Standard API

Nomal Requests

Client
 ├──► Request 1 (/user/1)
 │       └──► Server Response 1
 ├──► Request 2 (/user/2)
 │       └──► Server Response 2
 └──► Request 3 (/order)
         └──► Server Response 3

Batch Request

Client
 └──► Single Request (/batch)
          ├─ Sub-request 1: GET /user/1
          ├─ Sub-request 2: GET /user/2
          └─ Sub-request 3: POST /order
          ↓
       Server processes all
          ↓
       Combined Response:
          [Result1, Result2, Result3]

Batch API can help:

Reduce network latency by sending one combined request instead of many.

Lower bandwidth and connection overhead, since headers and handshakes are shared.

Improve client performance, especially on mobile or slow networks.

Simplify transactional logic, enabling unified error handling or rollback.

Optimize API Gateway throughput, preventing request flooding.

Typical Use Cases of Batch API

Scenario	Description
1. Bulk data queries	Retrieve multiple users, products, or posts at once to avoid repeated requests.
2. Bulk write or update	Create or update multiple records in one operation (e.g., batch upload, inventory update).
3. Front-end performance optimization	Reduce the number of HTTP calls from browsers or mobile apps for faster load times.
4. Backend task aggregation	In microservice systems, merge several internal API calls into one external call.
5. Data synchronization	Sync multiple resource states or execute batch operations (e.g., tagging, deletion).
6. Rate-limit optimization	Decrease API Gateway load and save bandwidth by consolidating requests.

Key Factors Affecting Batch API Efficiency

How much cost can Batch APIs save compared to real-time APIs?

Industry analysis (Growth-onomics) shows cost reductions of about 20–45%, mainly from fewer network round trips, lower connection overhead, and concentrated processing, though exact savings depend on call frequency, batch size, and system design.

What about latency—can Batch APIs really finish “within 24 hours”?

Batch APIs usually run asynchronously with much higher latency than real-time APIs; many systems execute hourly or daily, so “within 24 hours” depends on the SLA rather than being guaranteed.

Does batch size or file size affect speed?

Yes. Larger batches (e.g., more JSONL lines) increase transfer and parsing time almost linearly; while throughput improves, total completion time per batch grows.

Why are Batch APIs better for high-throughput workloads?

By aggregating thousands of requests into one process, Batch APIs reduce per-call overhead and allow parallel execution or caching reuse, often improving throughput by 17–92% in large-scale operations, though this comes at the cost of higher latency.

How to Use Batch API？

Novita’s Batch API is highly compatible with OpenAI’s interface, supporting /v1/chat/completions and /v1/completions so existing code can be reused with minimal changes. It accepts .jsonl input files where each line represents an individual request to the same model, identified by a unique custom_id for easy tracking. The output is also in JSONL format, making large-scale post-processing, analysis, and integration straightforward and efficient.

Try Novita AI‘s Batch API Service Now!

1. Prepare Your Batch Input File

Create a .jsonl file, where each line is one API request in JSON format.
Example (batch_input.jsonl):

{"custom_id": "req-1", "body": {"model": "deepseek/deepseek-v3-0324", "messages": [{"role": "user", "content": "Summarize: batch API basics"}], "max_tokens": 200}}
{"custom_id": "req-2", "body": {"model": "deepseek/deepseek-v3-0324", "messages": [{"role": "system", "content": "You are concise."},{"role": "user", "content": "List 3 batch API use cases"}], "max_tokens": 150}}

Rules:

One request per line.
All requests must use the same model.
Each line must include a unique custom_id.

2. Upload the Input File and Create a Batch

Use Python or curl to upload the file and immediately start the batch job.

Python

from openai import OpenAI

client = OpenAI(base_url="https://api.novita.ai/openai/v1", api_key="YOUR_API_KEY")

# Upload + create batch
uploaded = client.files.create(file=open("batch_input.jsonl", "rb"), purpose="batch")

batch = client.batches.create(
    input_file_id=uploaded.id,
    endpoint="/v1/chat/completions",
    completion_window="48h"
)

print("file_id:", uploaded.id)
print("batch_id:", batch.id)

curl

export API_KEY="YOUR_API_KEY"

# Upload file
upload_response=$(curl -s -X POST \
  -H "Authorization: Bearer ${API_KEY}" \
  -F 'file=@batch_input.jsonl' -F 'purpose=batch' \
  https://api.novita.ai/openai/v1/files)

# Extract file_id and start batch
file_id=$(echo $upload_response | jq -r '.id')

curl -X POST https://api.novita.ai/openai/v1/batches \
  -H "Authorization: Bearer ${API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"input_file_id\": \"$file_id\", \"endpoint\": \"/v1/chat/completions\", \"completion_window\": \"48h\"}"

Check your API Key

3. Check Batch Status

You can check progress anytime using the batch ID.

batch = client.batches.retrieve("batch_xxx")
print(batch.status)

Statuses include:

VALIDATING – input file being checked
PROGRESS – running
COMPLETED – finished successfully
FAILED – failed
EXPIRED – exceeded 48 h window

4. Retrieve Results

Once completed, download the result file via output_file_id:

output = client.files.content("file_xxx")
print(output.read().decode("utf-8"))

Each line in the output corresponds to one input, matched by custom_id.

Tips:

only support deepseek/deepseek-r1-0528

Up to 50 000 requests per batch

Input file ≤ 100 MB

Completion window fixed at 48 hours

Output retained for 30 days

How Should Failed Requests be Handled in a Batch API?

Errors encountered during batch processing are logged in a separate error file, accessible via the error_file_id field. Each failed sub-request includes an error code and description. Common examples:

Error Code	Description	Solution
400	Invalid request format	Check JSONL syntax and required fields
401	Authentication failed	Verify API key
404	Batch not found	Check batch ID
429	Rate limit exceeded	Reduce request frequency
500	Server error	Contact provider or retry later

Developers should reprocess only the failed entries using a retry queue or smaller follow-up batch, instead of resubmitting the entire original file.

What Endpoints Does Novita AI Provide for Batch API Operations?

Endpoint	Purpose
Create batch	Submit a new batch job containing multiple requests.
Retrieve batch	Get the status or results of a specific batch by its ID.
Cancel batch	Stop a running batch job before completion.
List batch	List all submitted batch jobs for the account.
Upload file	Upload input data files (e.g., JSONL).
List files	View all uploaded files.
Retrieve file	Get file metadata by file ID.
Delete file	Remove an uploaded file.
Retrieve file content	Download the actual contents of a file (e.g., results or error log).

Batch API consolidates many small requests into one efficient workflow. By using Novita AI’s Batch API, developers can cut network costs by up to 45%, scale throughput for up to 50 000 requests per batch, and simplify error handling through built-in logging and retrieval endpoints. While it sacrifices real-time speed, it delivers exceptional efficiency for bulk inference, synchronization, and data-processing workloads.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Batch API: Reduce Bandwidth Waste and Improve API Efficiency

Try Batch Inference at 50% Off

What is Batch API？

Key Difference Between Batch API and Standard API

Typical Use Cases of Batch API

Key Factors Affecting Batch API Efficiency

How to Use Batch API？

How Should Failed Requests be Handled in a Batch API?

What Endpoints Does Novita AI Provide for Batch API Operations?

Recommended Reading

Discover more from Novita

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Try Batch Inference at 50% Off

What is Batch API？

Key Difference Between Batch API and Standard API

Typical Use Cases of Batch API

Key Factors Affecting Batch API Efficiency

How to Use Batch API？

How Should Failed Requests be Handled in a Batch API?

What Endpoints Does Novita AI Provide for Batch API Operations?

Recommended Reading

Discover more from Novita

Related Posts

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Discover more from Novita