MiMo 2.5 API on Novita AI: OpenAI-Compatible Chat API

Table Of Contents

Quick setup
What Xiaomi MiMo-V2.5-Pro is
When to use MiMo-V2.5-Pro
Prerequisites
First API request
Streaming and long-context usage notes
Function calling example
Structured output example
Cost, latency, and limit cautions
Troubleshooting
Next steps

Xiaomi MiMo-V2.5-Pro, often searched as the MiMo 2.5 API, is available on Novita AI through the Serverless API. Developers can call the verified model ID xiaomimimo/mimo-v2.5-pro with Novita’s OpenAI-compatible chat completions endpoint for long-context text workflows, reasoning-heavy coding tasks, function calling, and structured output experiments.

Quick setup

Use this starting point when you already have a Novita AI API key and want the shortest verified path to a first MiMo 2.5 API request.

pip install openai
export NOVITA_API_KEY="YOUR_NOVITA_API_KEY"


import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "You are a concise software architecture assistant.",
        },
        {
            "role": "user",
            "content": "Outline a migration plan for moving a Python monolith into services.",
        },
    ],
    max_tokens=2048,
    temperature=0.7,
)

print(response.choices[0].message.content)

The current Xiaomi MiMo-V2.5-Pro model page lists the model as a Novita AI Serverless API option and shows the OpenAI-compatible base URL https://api.novita.ai/openai. The current Novita AI chat completions API reference documents the REST path as /openai/v1/chat/completions. If your application already uses OpenAI SDK patterns, Novita’s compatibility path means the main changes are the base URL, API key, and model ID; Novita’s public docs list chat completions rather than the OpenAI Responses API endpoint shape.

What Xiaomi MiMo-V2.5-Pro is

Xiaomi MiMo-V2.5-Pro is a text-in, text-out large language model for complex agentic and software engineering workflows. On Novita AI, the API model ID is:


xiaomimimo/mimo-v2.5-pro

The Novita AI model listing currently verifies these implementation details:

Field	Current value
Access path	Serverless API
Endpoint family	OpenAI-compatible chat completions
Base URL	`https://api.novita.ai/openai`
Model ID	`xiaomimimo/mimo-v2.5-pro`
Context length	1,048,576 tokens
Max output	131,072 tokens
Input capability	Text
Output capability	Text
Function calling	Supported
Structured output	Supported
Reasoning	Supported
Anthropic API	Supported

Upstream, the Xiaomi MiMo-V2.5-Pro Hugging Face model card describes MiMo-V2.5-Pro as an open-source Mixture-of-Experts language model with 1.02 trillion total parameters, 42 billion active parameters, and up to a 1 million-token context window. Treat the upstream model card as useful background for the model family, and use the Novita AI model page for Novita-specific model ID, pricing, availability, and endpoint details.

When to use MiMo-V2.5-Pro

MiMo-V2.5-Pro is most useful when your application needs a hosted text model for long-running, instruction-heavy work rather than a short, single-turn answer. Good evaluation targets include repository analysis, multi-file refactoring plans, long-context document synthesis, agent planning, tool-routing prototypes, and structured extraction tasks.

Use it when you need:

A verified 1,048,576-token context window on Novita AI.
A high max-output setting for detailed plans, code reviews, migration outlines, or multi-step reasoning traces.
Function calling support for routing model decisions into application tools.
Structured output support for JSON-like responses that downstream services can parse.
OpenAI-compatible chat completions so you can reuse existing SDK patterns.

Do not assume it is the lowest-cost or lowest-latency choice for every task. For short prompts, high-volume classification, or simple chat, compare current options in the Novita AI model library and Novita AI pricing page before routing all traffic to a long-context model.

Prerequisites

Before you make the first request, prepare four things:

A Novita AI account.
A Novita AI API key stored in an environment variable such as NOVITA_API_KEY.
The OpenAI Python SDK or another HTTP client that can call OpenAI-compatible endpoints.
A current pricing and limit check for xiaomimimo/mimo-v2.5-pro.

The current Novita AI model page lists token-based serverless pricing for MiMo-V2.5-Pro at $0.522 per million input tokens, $0.0043 per million cache-read tokens, and $1.044 per million output tokens.

Because model pages and pricing tables can change, confirm current rates on the model page or pricing page before production use, especially for long prompts where small per-token changes can become material.

First API request

The simplest first call uses the OpenAI SDK with Novita AI’s base URL and the MiMo-V2.5-Pro model ID.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "You are a practical coding assistant."},
        {
            "role": "user",
            "content": "Review this API design and list the main reliability risks.",
        },
    ],
    max_tokens=1024,
    temperature=0.7,
)

print(response.choices[0].message.content)

If you prefer REST, call the current chat completions path directly:

curl --request POST \
  --url https://api.novita.ai/openai/v1/chat/completions \
  --header "Authorization: Bearer $NOVITA_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "xiaomimimo/mimo-v2.5-pro",
    "messages": [
      {
        "role": "system",
        "content": "You are a practical coding assistant."
      },
      {
        "role": "user",
        "content": "Create a checklist for validating a payment webhook integration."
      }
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

For production code, keep API keys out of source control, cap max_tokens to the actual amount your workflow needs, and log token usage so you can see when long-context prompts start to dominate cost.

Streaming and long-context usage notes

The Novita AI chat completions reference includes stream and stream_options fields, and the MiMo-V2.5-Pro model page verifies a 1,048,576-token context length with a 131,072-token max output. Use those limits as engineering ceilings, not default settings.

For a streamed response:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

stream = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "You write concise engineering plans."},
        {
            "role": "user",
            "content": "Draft a step-by-step rollback plan for a failed database migration.",
        },
    ],
    max_tokens=2048,
    temperature=0.7,
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="")

For long-context prompts, keep these practices in place:

Put the task, output format, and stop conditions near the beginning of the conversation.
Split unrelated files or documents into clearly labeled sections.
Ask the model to cite section names or file paths from the supplied context rather than inventing locations.
Start with smaller max_tokens values during development, then raise the cap only when the workflow needs longer output.
Track prompt, completion, and total tokens from the usage object when the response includes it.

Function calling example

The MiMo-V2.5-Pro model page verifies function calling support, and the chat completions reference includes a tools parameter with function metadata. Use this when the model should choose an application action, such as looking up an internal ticket, fetching an account state, or creating a deployment task.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "create_deployment_ticket",
            "description": "Create a deployment follow-up ticket.",
            "parameters": {
                "type": "object",
                "properties": {
                    "service": {
                        "type": "string",
                        "description": "The service that needs follow-up.",
                    },
                    "priority": {
                        "type": "string",
                        "enum": ["low", "medium", "high"],
                    },
                    "summary": {
                        "type": "string",
                        "description": "A short ticket summary.",
                    },
                },
                "required": ["service", "priority", "summary"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    }
]

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "Decide whether the user's deployment note requires a follow-up ticket.",
        },
        {
            "role": "user",
            "content": "Checkout latency increased after the payment-service deploy. Create a high priority follow-up.",
        },
    ],
    tools=tools,
    max_tokens=1024,
    temperature=0.2,
)

message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(args)
else:
    print(message.content)

In a real application, validate the tool arguments server-side before executing any action. Function calling gives your application a structured decision from the model; it does not replace authorization, input validation, audit logging, or rollback controls.

Structured output example

The MiMo-V2.5-Pro model page also verifies structured output support, and the Novita AI chat completions reference includes response_format with JSON schema fields. Use structured output when you want the model to return parseable data rather than free-form prose.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "migration_risk_report",
        "schema": {
            "type": "object",
            "properties": {
                "risk_level": {
                    "type": "string",
                    "enum": ["low", "medium", "high"],
                },
                "main_risks": {
                    "type": "array",
                    "items": {"type": "string"},
                },
                "next_actions": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["risk_level", "main_risks", "next_actions"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "Return only a structured migration risk report.",
        },
        {
            "role": "user",
            "content": "We are moving billing jobs from cron to a queue and changing retry behavior.",
        },
    ],
    response_format=response_format,
    max_tokens=1024,
    temperature=0.2,
)

report = json.loads(response.choices[0].message.content)
print(json.dumps(report, indent=2))

If the response fails JSON parsing during testing, reduce ambiguity in the prompt, lower temperature, keep the schema compact, and retry with a shorter input. For production workflows, validate the parsed object against your own schema before using it.

Cost, latency, and limit cautions

MiMo-V2.5-Pro is built for long-context and agentic tasks, so cost control matters. A prompt close to the top context tier can be materially more expensive than a short request, and long outputs also increase completion-token cost. The Novita AI model page currently lists tiered input/output pricing, so estimate both prompt and completion tokens before routing large jobs automatically.

Use these controls:

Set a practical max_tokens cap instead of using the maximum output limit by default.
Summarize or retrieve only the context needed for the current task.
Cache or reuse stable context where your architecture supports it.
Monitor token usage per feature, user, and workflow.
Add timeouts and retries around network calls.
Keep a smaller fallback model for short tasks if your product does not need MiMo-V2.5-Pro on every request.

Latency can vary with prompt length, output length, streaming mode, and current service conditions. For user-facing applications, stream longer answers when appropriate and design the UI around incremental output rather than a single blocking response.

Troubleshooting

If your first MiMo-V2.5-Pro request fails, check these items first.

Symptom	Likely cause	Fix
Authentication error	Missing or malformed API key	Send `Authorization: Bearer $NOVITA_API_KEY` for REST calls or pass `api_key` to the SDK client.
Model not found	Incorrect model ID	Use `xiaomimimo/mimo-v2.5-pro` exactly as listed on the Novita AI model page.
Request path error	Mixed base URLs	Use `https://api.novita.ai/openai` with the OpenAI SDK, or `https://api.novita.ai/openai/v1/chat/completions` for REST.
Context or output error	Prompt plus output request exceeds model limits	Keep total prompt length within the current context window and cap `max_tokens` below the verified max output.
Tool call is missing	Prompt does not require a tool, or tool schema is unclear	Make the tool decision explicit and keep the function schema concise.
Structured output fails to parse	Schema or prompt is too loose	Use `response_format`, set `strict` when appropriate, lower temperature, and validate the result.
Unexpected cost	Large prompt, large output, or upper pricing tier	Check current pricing, log token usage, and reduce context or max output.

For endpoint details, refer to the Novita AI OpenAI-compatible chat completions API reference. For model-specific limits and pricing, refer to the MiMo 2.5 API and playground page.

Next steps

Start with the Xiaomi MiMo-V2.5-Pro model page, test a small prompt in the playground, then move the same prompt into your API client with the verified model ID. When you are ready to compare alternatives, use the Novita AI model library and Novita AI pricing page to check current pricing, context windows, output limits, and capability support.

For agent-style applications, evaluate MiMo-V2.5-Pro against your own traces: repository edits, tool-call routing, structured extraction, long-context summarization, and recovery from ambiguous instructions. Keep the evaluation tied to your application’s real prompts rather than generic benchmark claims. If you are still choosing between inference providers for agentic workflows, the inference provider selection guide for AI agents covers context length, function calling, latency, and concurrency criteria. For a coding-focused alternative at a different price point, CoBuddy on Novita AI is also available through the same OpenAI-compatible endpoint.

MiMo 2.5 API on Novita AI: OpenAI-Compatible Chat API

Quick setup

What Xiaomi MiMo-V2.5-Pro is

When to use MiMo-V2.5-Pro

Prerequisites

First API request

Streaming and long-context usage notes

Function calling example

Structured output example

Cost, latency, and limit cautions

Troubleshooting

Next steps

Product

RESOURCES

Partners

Company

Quick setup

What Xiaomi MiMo-V2.5-Pro is

When to use MiMo-V2.5-Pro

Prerequisites

First API request

Streaming and long-context usage notes

Function calling example

Structured output example

Cost, latency, and limit cautions

Troubleshooting

Next steps

Related Posts

Product

RESOURCES

Partners

Company