MiniMax M3 is available through Novita AI with the model ID minimax/minimax-m3, an OpenAI-compatible base URL, a 1,000,000-token context window, and tiered pricing for longer prompts. This quick start keeps the setup practical: copy the model ID, set your API key, run one small request, then scale up to longer prompts once the basics work.
Table Of Contents
- When to Use This Quick Start
- Step 1: Get Your Novita API Key
- Step 2: Confirm Model ID and Endpoint
- Step 3: Send Your First Request
- Step 4: Read the Response
- Step 5: Check Pricing, Limits, and Common Errors
- Python Example
- Send the Request with cURL
- Key Parameters
- Troubleshooting
- FAQ
When to Use This Quick Start
Use this quick start when you want to test MiniMax M3 through a serverless API path instead of building around raw model hosting. It is for developers who already have a prompt or workload in mind and need the endpoint, model ID, token limits, and pricing details before writing a small proof of concept.
MiniMax M3 is a strong fit when your request needs a large context budget, structured output, tool-oriented tasks, or coding assistance over long inputs. On Novita AI, the current model page lists text, image, and video as accepted input modalities, text as the output modality, and support for serverless access, function calling, structured output, reasoning, and Anthropic API compatibility.
This is not a benchmark deep dive or a launch announcement. The goal is simpler: make one clean request, then decide whether MiniMax M3 fits your application.
Step 1: Get Your Novita API Key
Create or select a Novita AI account, open your API key settings, and generate a key for server-side use. Keep the key out of client-side code, frontend bundles, public repositories, and notebooks that may be shared outside your team.
Set the key as an environment variable before running the examples:
export NOVITA_API_KEY="your_api_key_here"
If you are testing in a team environment, use a scoped project key or a temporary key if your account setup supports it. Rotate the key after public demos, shared experiments, or any accidental exposure.
Step 2: Confirm Model ID and Endpoint
Before you write code, keep the MiniMax M3 connection details in one place:
| Field | Value |
|---|---|
| Model ID | minimax/minimax-m3 |
| Base URL | https://api.novita.ai/openai |
| Chat completions URL | https://api.novita.ai/openai/v1/chat/completions |
| Context window | 1,000,000 tokens |
| Maximum output | 131,072 tokens |
| Inputs | Text, image, video |
| Output | Text |
| Serverless support | Supported |
| Function calling | Supported |
| Structured output | Supported |
| Reasoning | Supported |
| Anthropic API compatibility | Supported |
Check the MiniMax M3 model documentation before you ship, since availability, pricing, and limits can change.
Step 3: Send Your First Request
Start with a short text-only chat request. It is much easier to debug authentication, routing, and response parsing before you add a large prompt.
For the first prompt, ask for a deterministic, easy-to-check output. For example: Summarize the main implementation risks in a long-context code review process.
Keep max_tokens modest for the first call. MiniMax M3 supports much longer output, but the first job is to confirm that the integration works.
Step 4: Read the Response
An OpenAI-compatible chat completion response usually returns the assistant answer at choices[0].message.content.
Also log the request ID or response metadata your runtime exposes. Those details are useful when a request fails or runs slowly. For cost tracking, record prompt size, output size, cache-read usage if your workload uses cached context, and whether the request entered the long-context pricing band.
Do not treat the first output as proof that your prompt is ready for users. Once the integration works, test prompts that look like your real workload: long codebases, multi-file instructions, tool schemas, structured JSON output, or multimodal inputs if your application needs them.
Step 5: Check Pricing, Limits, and Common Errors
MiniMax M3 uses tiered pricing on Novita AI. The price changes once the prompt enters the long-context band:
| Prompt size band | Input | Output | Cache read |
|---|---|---|---|
| Less than 524,288 tokens | $0.30 per 1M tokens | $1.20 per 1M tokens | $0.06 per 1M tokens |
| 524,288 to 1,000,000 tokens | $1.20 per 1M tokens | $4.80 per 1M tokens | $0.24 per 1M tokens |
That split matters. A 50,000-token test and a near-1M-token request are not priced the same way. When you estimate cost, include prompt length, expected output length, cache behavior, retries, and how often users may send very large requests.
MiniMax M3 currently has a 1,000,000-token context window and a 131,072-token maximum output. Before shipping, recheck the MiniMax M3 model documentation for the latest price table and any rate-limit guidance attached to your account.
Common setup errors include:
- Missing or malformed
Authorizationheader. - Using the wrong model ID, such as a display name instead of
minimax/minimax-m3. - Sending requests to the wrong base URL.
- Setting
max_tokenshigher than your application can safely consume. - Testing long-context prompts without accounting for the higher pricing band.
- Passing multimodal content in a shape that your client library does not support.
Python Example
This example uses the OpenAI Python SDK with Novita AI’s OpenAI-compatible base URL.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["NOVITA_API_KEY"],
base_url="https://api.novita.ai/openai/v1",
)
response = client.chat.completions.create(
model="minimax/minimax-m3",
messages=[
{
"role": "system",
"content": "You are a concise API assistant for software teams.",
},
{
"role": "user",
"content": "Summarize the main implementation risks in a long-context code review process.",
},
],
temperature=0.2,
max_tokens=600,
)
answer = response.choices[0].message.content
print(answer)
Send the Request with cURL
If you prefer cURL, keep the JSON body in a payload variable. This makes the request easier to read and avoids cramming the full JSON body into a single command.
payload='{
"model": "minimax/minimax-m3",
"messages": [
{
"role": "system",
"content": "You are a concise API assistant for software teams."
},
{
"role": "user",
"content": "Summarize the main implementation risks in a long-context code review process."
}
],
"temperature": 0.2,
"max_tokens": 600
}'
curl --request POST "https://api.novita.ai/openai/v1/chat/completions" \
--header "Authorization: Bearer $NOVITA_API_KEY" \
--header "Content-Type: application/json" \
--data "$payload"
Key Parameters
| Parameter | What it controls | Start with |
|---|---|---|
model | Which hosted model answers the request | minimax/minimax-m3 |
messages | System and user instructions | A short, text-only prompt |
temperature | Output variability | 0.2 for repeatable tests |
max_tokens | Maximum generated output | A small cap, then raise it later |
stream | Whether tokens stream back progressively | Enable after the basic call works |
tools | Function/tool definitions | Add one tool at a time |
response_format | Structured response shape | Validate the output before using it |
For multimodal inputs, confirm the exact request shape in your SDK or API documentation before relying on image or video prompts. The model page lists modality support, but request formatting depends on the client path you use.
Troubleshooting
Authentication fails
Check that NOVITA_API_KEY is set in the same shell or runtime where you run the request. The authorization header must use the bearer-token format.
The API cannot find the model
Confirm that the request uses minimax/minimax-m3, not MiniMax M3, minimax-m3, or a blog title. Model display names and model IDs are not interchangeable.
The request works for short prompts but fails for long prompts
Measure the serialized input, not just the visible word count. Tool schemas, retrieved documents, image references, and conversation history all count. If you are getting close to 1,000,000 tokens, try a smaller prompt and add truncation or retrieval logic before retrying.
The bill is higher than expected
Check whether the prompt entered the 524,288-to-1,000,000-token pricing band. MiniMax M3 has higher input, output, and cache-read prices in that long-context tier.
Structured output is inconsistent
Start with a smaller schema, lower temperature, and explicit validation. If your application requires strict JSON, handle malformed responses with validation and retry logic instead of assuming every response will parse.
Tool calls do not match your function schema
Test one tool at a time. Keep function names, descriptions, and parameter schemas clear, and add server-side validation before executing any tool call.
FAQ
Is MiniMax M3 available through the Novita AI API?
Yes. The current Novita AI model page lists MiniMax M3 as available through serverless API access with the model ID minimax/minimax-m3.
What is the model ID for MiniMax M3?
Use minimax/minimax-m3.
What base URL should I use?
Use https://api.novita.ai/openai as the OpenAI-compatible base URL. In OpenAI SDKs, set the SDK base URL to https://api.novita.ai/openai/v1.
How much does MiniMax M3 cost on Novita AI?
MiniMax M3 pricing is tiered. For prompts below 524,288 tokens, input is $0.30 per 1M tokens, output is $1.20 per 1M tokens, and cache read is $0.06 per 1M tokens. For prompts from 524,288 to 1,000,000 tokens, input is $1.20 per 1M tokens, output is $4.80 per 1M tokens, and cache read is $0.24 per 1M tokens.
Does MiniMax M3 support streaming or multimodal input?
The current model page lists text, image, and video inputs with text output. Streaming behavior should be tested through the OpenAI-compatible chat completions path before production use.
What is the maximum context window?
MiniMax M3 currently has a 1,000,000-token context window and a maximum output of 131,072 tokens.
Recommended Articles
Discover more from Novita
Subscribe to get the latest posts sent to your email.





