MiniMax M3 API Key, Model IDs & OpenAI Endpoint

MiniMax M3 on Novita AI blog cover

MiniMax M3 is available through Novita AI with the model ID minimax/minimax-m3, an OpenAI-compatible base URL, a 1,000,000-token context window, and tiered pricing for longer prompts. This quick start keeps the setup practical: copy the model ID, set your API key, run one small request, then scale up to longer prompts once the basics work.

Table Of Contents

When to Use This Quick Start

Use this quick start when you want to test MiniMax M3 through a serverless API path instead of building around raw model hosting. It is for developers who already have a prompt or workload in mind and need the endpoint, model ID, token limits, and pricing details before writing a small proof of concept.

MiniMax M3 is a strong fit when your request needs a large context budget, structured output, tool-oriented tasks, or coding assistance over long inputs. On Novita AI, the current model page lists text, image, and video as accepted input modalities, text as the output modality, and support for serverless access, function calling, structured output, reasoning, and Anthropic API compatibility.

This is not a benchmark deep dive or a launch announcement. The goal is simpler: make one clean request, then decide whether MiniMax M3 fits your application.

Step 1: Get Your Novita API Key

Create or select a Novita AI account, open your API key settings, and generate a key for server-side use. Keep the key out of client-side code, frontend bundles, public repositories, and notebooks that may be shared outside your team.

Set the key as an environment variable before running the examples:

export NOVITA_API_KEY="your_api_key_here"

If you are testing in a team environment, use a scoped project key or a temporary key if your account setup supports it. Rotate the key after public demos, shared experiments, or any accidental exposure.

Step 2: Confirm Model ID and Endpoint

Before you write code, keep the MiniMax M3 connection details in one place:

FieldValue
Model IDminimax/minimax-m3
Base URLhttps://api.novita.ai/openai
Chat completions URLhttps://api.novita.ai/openai/v1/chat/completions
Context window1,000,000 tokens
Maximum output131,072 tokens
InputsText, image, video
OutputText
Serverless supportSupported
Function callingSupported
Structured outputSupported
ReasoningSupported
Anthropic API compatibilitySupported

Check the MiniMax M3 model documentation before you ship, since availability, pricing, and limits can change.

Step 3: Send Your First Request

Start with a short text-only chat request. It is much easier to debug authentication, routing, and response parsing before you add a large prompt.

For the first prompt, ask for a deterministic, easy-to-check output. For example: Summarize the main implementation risks in a long-context code review process.

Keep max_tokens modest for the first call. MiniMax M3 supports much longer output, but the first job is to confirm that the integration works.

Step 4: Read the Response

An OpenAI-compatible chat completion response usually returns the assistant answer at choices[0].message.content.

Also log the request ID or response metadata your runtime exposes. Those details are useful when a request fails or runs slowly. For cost tracking, record prompt size, output size, cache-read usage if your workload uses cached context, and whether the request entered the long-context pricing band.

Do not treat the first output as proof that your prompt is ready for users. Once the integration works, test prompts that look like your real workload: long codebases, multi-file instructions, tool schemas, structured JSON output, or multimodal inputs if your application needs them.

Step 5: Check Pricing, Limits, and Common Errors

MiniMax M3 uses tiered pricing on Novita AI. The price changes once the prompt enters the long-context band:

Prompt size bandInputOutputCache read
Less than 524,288 tokens$0.30 per 1M tokens$1.20 per 1M tokens$0.06 per 1M tokens
524,288 to 1,000,000 tokens$1.20 per 1M tokens$4.80 per 1M tokens$0.24 per 1M tokens

That split matters. A 50,000-token test and a near-1M-token request are not priced the same way. When you estimate cost, include prompt length, expected output length, cache behavior, retries, and how often users may send very large requests.

MiniMax M3 currently has a 1,000,000-token context window and a 131,072-token maximum output. Before shipping, recheck the MiniMax M3 model documentation for the latest price table and any rate-limit guidance attached to your account.

Common setup errors include:

  • Missing or malformed Authorization header.
  • Using the wrong model ID, such as a display name instead of minimax/minimax-m3.
  • Sending requests to the wrong base URL.
  • Setting max_tokens higher than your application can safely consume.
  • Testing long-context prompts without accounting for the higher pricing band.
  • Passing multimodal content in a shape that your client library does not support.

Python Example

This example uses the OpenAI Python SDK with Novita AI’s OpenAI-compatible base URL.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai/v1",
)

response = client.chat.completions.create(
    model="minimax/minimax-m3",
    messages=[
        {
            "role": "system",
            "content": "You are a concise API assistant for software teams.",
        },
        {
            "role": "user",
            "content": "Summarize the main implementation risks in a long-context code review process.",
        },
    ],
    temperature=0.2,
    max_tokens=600,
)

answer = response.choices[0].message.content
print(answer)

Send the Request with cURL

If you prefer cURL, keep the JSON body in a payload variable. This makes the request easier to read and avoids cramming the full JSON body into a single command.

payload='{
  "model": "minimax/minimax-m3",
  "messages": [
    {
      "role": "system",
      "content": "You are a concise API assistant for software teams."
    },
    {
      "role": "user",
      "content": "Summarize the main implementation risks in a long-context code review process."
    }
  ],
  "temperature": 0.2,
  "max_tokens": 600
}'

curl --request POST "https://api.novita.ai/openai/v1/chat/completions" \
  --header "Authorization: Bearer $NOVITA_API_KEY" \
  --header "Content-Type: application/json" \
  --data "$payload"

Key Parameters

ParameterWhat it controlsStart with
modelWhich hosted model answers the requestminimax/minimax-m3
messagesSystem and user instructionsA short, text-only prompt
temperatureOutput variability0.2 for repeatable tests
max_tokensMaximum generated outputA small cap, then raise it later
streamWhether tokens stream back progressivelyEnable after the basic call works
toolsFunction/tool definitionsAdd one tool at a time
response_formatStructured response shapeValidate the output before using it

For multimodal inputs, confirm the exact request shape in your SDK or API documentation before relying on image or video prompts. The model page lists modality support, but request formatting depends on the client path you use.

Troubleshooting

Authentication fails

Check that NOVITA_API_KEY is set in the same shell or runtime where you run the request. The authorization header must use the bearer-token format.

The API cannot find the model

Confirm that the request uses minimax/minimax-m3, not MiniMax M3, minimax-m3, or a blog title. Model display names and model IDs are not interchangeable.

The request works for short prompts but fails for long prompts

Measure the serialized input, not just the visible word count. Tool schemas, retrieved documents, image references, and conversation history all count. If you are getting close to 1,000,000 tokens, try a smaller prompt and add truncation or retrieval logic before retrying.

The bill is higher than expected

Check whether the prompt entered the 524,288-to-1,000,000-token pricing band. MiniMax M3 has higher input, output, and cache-read prices in that long-context tier.

Structured output is inconsistent

Start with a smaller schema, lower temperature, and explicit validation. If your application requires strict JSON, handle malformed responses with validation and retry logic instead of assuming every response will parse.

Tool calls do not match your function schema

Test one tool at a time. Keep function names, descriptions, and parameter schemas clear, and add server-side validation before executing any tool call.

FAQ

Is MiniMax M3 available through the Novita AI API?

Yes. The current Novita AI model page lists MiniMax M3 as available through serverless API access with the model ID minimax/minimax-m3.

What is the model ID for MiniMax M3?

Use minimax/minimax-m3.

What base URL should I use?

Use https://api.novita.ai/openai as the OpenAI-compatible base URL. In OpenAI SDKs, set the SDK base URL to https://api.novita.ai/openai/v1.

How much does MiniMax M3 cost on Novita AI?

MiniMax M3 pricing is tiered. For prompts below 524,288 tokens, input is $0.30 per 1M tokens, output is $1.20 per 1M tokens, and cache read is $0.06 per 1M tokens. For prompts from 524,288 to 1,000,000 tokens, input is $1.20 per 1M tokens, output is $4.80 per 1M tokens, and cache read is $0.24 per 1M tokens.

Does MiniMax M3 support streaming or multimodal input?

The current model page lists text, image, and video inputs with text output. Streaming behavior should be tested through the OpenAI-compatible chat completions path before production use.

What is the maximum context window?

MiniMax M3 currently has a 1,000,000-token context window and a maximum output of 131,072 tokens.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading