How to Use Xiaomi MiMo-V2.5-Pro API on Novita AI

Xiaomi MIMO-V2.5-Pro API Quick Start Cover

Xiaomi MiMo-V2.5-Pro is available on Novita AI through the Serverless API, so developers can call the verified model ID xiaomimimo/mimo-v2.5-pro with Novita’s OpenAI-compatible chat completions endpoint for long-context text workflows, reasoning-heavy coding tasks, function calling, and structured output experiments.

Table Of Contents

  • [Quick setup](#quick-setup)
  • [What Xiaomi MiMo-V2.5-Pro is](#what-xiaomi-mimo-v25-pro-is)
  • [When to use MiMo-V2.5-Pro](#when-to-use-mimo-v25-pro)
  • [Prerequisites](#prerequisites)
  • [First API request](#first-api-request)
  • [Streaming and long-context usage notes](#streaming-and-long-context-usage-notes)
  • [Function calling example](#function-calling-example)
  • [Structured output example](#structured-output-example)
  • [Cost, latency, and limit cautions](#cost-latency-and-limit-cautions)
  • [Troubleshooting](#troubleshooting)
  • [Next steps](#next-steps)
  • [Recommended articles](#recommended-articles)

Quick setup

Use this starting point when you already have a Novita AI API key and want the shortest verified path to a first request.

pip install openai
export NOVITA_API_KEY="YOUR_NOVITA_API_KEY"
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "You are a concise software architecture assistant.",
        },
        {
            "role": "user",
            "content": "Outline a migration plan for moving a Python monolith into services.",
        },
    ],
    max_tokens=2048,
    temperature=0.7,
)

print(response.choices[0].message.content)

The current Xiaomi MiMo-V2.5-Pro model page lists the model as a Novita AI Serverless API option and shows the OpenAI-compatible base URL https://api.novita.ai/openai. The current Novita AI chat completions API reference documents the REST path as /openai/v1/chat/completions.

What Xiaomi MiMo-V2.5-Pro is

Xiaomi MiMo-V2.5-Pro is a text-in, text-out large language model for complex agentic and software engineering workflows. On Novita AI, the API model ID is:

xiaomimimo/mimo-v2.5-pro

The Novita AI model listing currently verifies these implementation details:

FieldCurrent value
Access pathServerless API
Endpoint familyOpenAI-compatible chat completions
Base URLhttps://api.novita.ai/openai
Model IDxiaomimimo/mimo-v2.5-pro
Context length1,048,576 tokens
Max output131,072 tokens
Input capabilityText
Output capabilityText
Function callingSupported
Structured outputSupported
ReasoningSupported
Anthropic APISupported

Upstream, the Xiaomi MiMo-V2.5-Pro Hugging Face model card describes MiMo-V2.5-Pro as an open-source Mixture-of-Experts language model with 1.02 trillion total parameters, 42 billion active parameters, and up to a 1 million-token context window. Treat the upstream model card as useful background for the model family, and use the Novita AI model page for Novita-specific model ID, pricing, availability, and endpoint details.

When to use MiMo-V2.5-Pro

MiMo-V2.5-Pro is most useful when your application needs a hosted text model for long-running, instruction-heavy work rather than a short, single-turn answer. Good evaluation targets include repository analysis, multi-file refactoring plans, long-context document synthesis, agent planning, tool-routing prototypes, and structured extraction tasks.

Use it when you need:

  • A verified 1,048,576-token context window on Novita AI.
  • A high max-output setting for detailed plans, code reviews, migration outlines, or multi-step reasoning traces.
  • Function calling support for routing model decisions into application tools.
  • Structured output support for JSON-like responses that downstream services can parse.
  • OpenAI-compatible chat completions so you can reuse existing SDK patterns.

Do not assume it is the lowest-cost or lowest-latency choice for every task. For short prompts, high-volume classification, or simple chat, compare current options in the Novita AI model library and Novita AI pricing page before routing all traffic to a long-context model.

Prerequisites

Before you make the first request, prepare four things:

  1. A Novita AI account.
  2. A Novita AI API key stored in an environment variable such as NOVITA_API_KEY.
  3. The OpenAI Python SDK or another HTTP client that can call OpenAI-compatible endpoints.
  4. A current pricing and limit check for xiaomimimo/mimo-v2.5-pro.

The current Novita AI model page lists token-based serverless pricing for MiMo-V2.5-Pro. It shows a general display price of $2 per million input tokens, $0.4 per million cached-read tokens, and $6 per million output tokens. It also lists tiered pricing by input length:

Input lengthInput price per million tokensOutput price per million tokensCached-read price per million tokens
1 to under 262,144 tokens$1$3$0.2
262,144 to under 1,048,576 tokens$2$6$0.4

Because model pages and pricing tables can change, confirm current rates on the model page or pricing page before production use, especially for long prompts where the second tier can apply.

First API request

The simplest first call uses the OpenAI SDK with Novita AI’s base URL and the MiMo-V2.5-Pro model ID.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "You are a practical coding assistant."},
        {
            "role": "user",
            "content": "Review this API design and list the main reliability risks.",
        },
    ],
    max_tokens=1024,
    temperature=0.7,
)

print(response.choices[0].message.content)

If you prefer REST, call the current chat completions path directly:

curl --request POST \
  --url https://api.novita.ai/openai/v1/chat/completions \
  --header "Authorization: Bearer $NOVITA_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "xiaomimimo/mimo-v2.5-pro",
    "messages": [
      {
        "role": "system",
        "content": "You are a practical coding assistant."
      },
      {
        "role": "user",
        "content": "Create a checklist for validating a payment webhook integration."
      }
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

For production code, keep API keys out of source control, cap max_tokens to the actual amount your workflow needs, and log token usage so you can see when long-context prompts start to dominate cost.

Streaming and long-context usage notes

The Novita AI chat completions reference includes stream and stream_options fields, and the MiMo-V2.5-Pro model page verifies a 1,048,576-token context length with a 131,072-token max output. Use those limits as engineering ceilings, not default settings.

For a streamed response:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

stream = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {"role": "system", "content": "You write concise engineering plans."},
        {
            "role": "user",
            "content": "Draft a step-by-step rollback plan for a failed database migration.",
        },
    ],
    max_tokens=2048,
    temperature=0.7,
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="")

For long-context prompts, keep these practices in place:

  • Put the task, output format, and stop conditions near the beginning of the conversation.
  • Split unrelated files or documents into clearly labeled sections.
  • Ask the model to cite section names or file paths from the supplied context rather than inventing locations.
  • Start with smaller max_tokens values during development, then raise the cap only when the workflow needs longer output.
  • Track prompt, completion, and total tokens from the usage object when the response includes it.

Function calling example

The MiMo-V2.5-Pro model page verifies function calling support, and the chat completions reference includes a tools parameter with function metadata. Use this when the model should choose an application action, such as looking up an internal ticket, fetching an account state, or creating a deployment task.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "create_deployment_ticket",
            "description": "Create a deployment follow-up ticket.",
            "parameters": {
                "type": "object",
                "properties": {
                    "service": {
                        "type": "string",
                        "description": "The service that needs follow-up.",
                    },
                    "priority": {
                        "type": "string",
                        "enum": ["low", "medium", "high"],
                    },
                    "summary": {
                        "type": "string",
                        "description": "A short ticket summary.",
                    },
                },
                "required": ["service", "priority", "summary"],
                "additionalProperties": False,
            },
            "strict": True,
        },
    }
]

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "Decide whether the user's deployment note requires a follow-up ticket.",
        },
        {
            "role": "user",
            "content": "Checkout latency increased after the payment-service deploy. Create a high priority follow-up.",
        },
    ],
    tools=tools,
    max_tokens=1024,
    temperature=0.2,
)

message = response.choices[0].message

if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(args)
else:
    print(message.content)

In a real application, validate the tool arguments server-side before executing any action. Function calling gives your application a structured decision from the model; it does not replace authorization, input validation, audit logging, or rollback controls.

Structured output example

The MiMo-V2.5-Pro model page also verifies structured output support, and the Novita AI chat completions reference includes response_format with JSON schema fields. Use structured output when you want the model to return parseable data rather than free-form prose.

import json
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["NOVITA_API_KEY"],
    base_url="https://api.novita.ai/openai",
)

response_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "migration_risk_report",
        "schema": {
            "type": "object",
            "properties": {
                "risk_level": {
                    "type": "string",
                    "enum": ["low", "medium", "high"],
                },
                "main_risks": {
                    "type": "array",
                    "items": {"type": "string"},
                },
                "next_actions": {
                    "type": "array",
                    "items": {"type": "string"},
                },
            },
            "required": ["risk_level", "main_risks", "next_actions"],
            "additionalProperties": False,
        },
        "strict": True,
    },
}

response = client.chat.completions.create(
    model="xiaomimimo/mimo-v2.5-pro",
    messages=[
        {
            "role": "system",
            "content": "Return only a structured migration risk report.",
        },
        {
            "role": "user",
            "content": "We are moving billing jobs from cron to a queue and changing retry behavior.",
        },
    ],
    response_format=response_format,
    max_tokens=1024,
    temperature=0.2,
)

report = json.loads(response.choices[0].message.content)
print(json.dumps(report, indent=2))

If the response fails JSON parsing during testing, reduce ambiguity in the prompt, lower temperature, keep the schema compact, and retry with a shorter input. For production workflows, validate the parsed object against your own schema before using it.

Cost, latency, and limit cautions

MiMo-V2.5-Pro is built for long-context and agentic tasks, so cost control matters. A prompt close to the top context tier can be materially more expensive than a short request, and long outputs also increase completion-token cost. The Novita AI model page currently lists tiered input/output pricing, so estimate both prompt and completion tokens before routing large jobs automatically.

Use these controls:

  • Set a practical max_tokens cap instead of using the maximum output limit by default.
  • Summarize or retrieve only the context needed for the current task.
  • Cache or reuse stable context where your architecture supports it.
  • Monitor token usage per feature, user, and workflow.
  • Add timeouts and retries around network calls.
  • Keep a smaller fallback model for short tasks if your product does not need MiMo-V2.5-Pro on every request.

Latency can vary with prompt length, output length, streaming mode, and current service conditions. For user-facing applications, stream longer answers when appropriate and design the UI around incremental output rather than a single blocking response.

Troubleshooting

If your first MiMo-V2.5-Pro request fails, check these items first.

SymptomLikely causeFix
Authentication errorMissing or malformed API keySend Authorization: Bearer $NOVITA_API_KEY for REST calls or pass api_key to the SDK client.
Model not foundIncorrect model IDUse xiaomimimo/mimo-v2.5-pro exactly as listed on the Novita AI model page.
Request path errorMixed base URLsUse https://api.novita.ai/openai with the OpenAI SDK, or https://api.novita.ai/openai/v1/chat/completions for REST.
Context or output errorPrompt plus output request exceeds model limitsKeep total prompt length within the current context window and cap max_tokens below the verified max output.
Tool call is missingPrompt does not require a tool, or tool schema is unclearMake the tool decision explicit and keep the function schema concise.
Structured output fails to parseSchema or prompt is too looseUse response_format, set strict when appropriate, lower temperature, and validate the result.
Unexpected costLarge prompt, large output, or upper pricing tierCheck current pricing, log token usage, and reduce context or max output.

For endpoint details, refer to the Novita AI chat completions API reference. For model-specific limits and pricing, refer to the Xiaomi MiMo-V2.5-Pro API and playground page.

Next steps

Start with the Xiaomi MiMo-V2.5-Pro model page, test a small prompt in the playground, then move the same prompt into your API client with the verified model ID. When you are ready to compare alternatives, use the Novita AI model library and Novita AI pricing page to check current pricing, context windows, output limits, and capability support.

For agent-style applications, evaluate MiMo-V2.5-Pro against your own traces: repository edits, tool-call routing, structured extraction, long-context summarization, and recovery from ambiguous instructions. Keep the evaluation tied to your application’s real prompts rather than generic benchmark claims.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading