MiniMax M2.5 Access Guide: Web, API, CLI, Self-Host 2026

How to Access MiniMax M2.5: API vs Self-Hosted Deployment Guide

MiniMax M2.5 is a 229-billion parameter sparse Mixture-of-Experts model, enabling efficient inference despite its scale. Released by Chinese AI company MiniMax, it ranks among top open-source models for autonomous coding and web navigation tasks, achieving 80.2% on SWE-Bench Verified and 76.3% on BrowseComp.

Novita offers an accelerated model that maintains the strong performance of the previous version while significantly improving speed.

Access Methods Comparison

MethodSetup TimeCost (1M tokens/day)Best For
Web Playground0 minutesFree (rate-limited)First-time evaluation, one-off tasks
Novita AI API2 minutesInput: $0.3 /Mt
Cache Read: $0.03 /Mt
Output: $1.2 /Mt
Production apps, moderate volume, rapid prototyping
NovitaClaw5 minutesInput: $0.3 /Mt
Cache Read: $0.03 /Mt
Output: $1.2 /Mt
Terminal automation, DevOps workflows
Claude Code5 minutesInput: $0.3 /Mt
Cache Read: $0.03 /Mt
Output: $1.2 /Mt
Codebase exploration, IDE integration
Local (Q4_K_M)30-60 minutesOne-time investment: $60,000–$90,000High-volume production, data privacy requirements
Cloud GPU5 minutes8x GPU $11.60/hrShort-term experiments, burst workloads, large-model testing

1. Web Playground

The fastest zero-barrier entry point is Novita AI’s web playground—no signup, no API keys, instant evaluation. This works best for quick capability testing before committing to API integration or local deployment.

Typical use cases: Prompt engineering, quality evaluation, coding task testing, comparing outputs with other models side-by-side. Web playground is ideal for first-time evaluation and one-off tasks—no technical setup required.

direct access to minimax m2.5

Why Choose Novita AI API?

  • OpenAI-compatible and Anthropic-compatible
  • Competitive pricing: $0.30/$1.20 per 1M tokens.
  • Support Cache Pricing: Cache pricing allows you to reuse previously saved prompts, helping reduce repeated computation and lower overall costs.

Setting up Guide

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

Choose Your Model

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Novita AI provides OpenAI-compatible endpoints for MiniMax M2.5

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 5: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI

client = OpenAI(
    api_key="<Your API Key>",
    base_url="https://api.novita.ai/openai"
)

response = client.chat.completions.create(
    model="minimax/minimax-m2.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ],
    max_tokens=131100,
    temperature=0.7
)

print(response.choices[0].message.content)

3. Code Tool Access

NovitaClaw

NovitaClaw is a command-line tool for deploying and managing persistent OpenClaw agents on the Novita Agent Sandbox. With a single command, you can launch a fully hosted agent instance that runs continuously—no session limits or manual restarts required. Once deployed, the agent can be accessed and controlled through multiple interfaces, including the CLI, a web-based UI, or external automation scripts.

Getting Started

Prerequisites

Before you begin, make sure you have:

  • Python installed
  • A Novita API key (create or manage keys in Key Management)

Step 1: Install NovitaClaw

macOS / Linux:

sudo pip3 install novitaclaw

Windows PowerShell:

pip install novitaclaw

Verify: run novitaclaw --help. If you see a list of commands, installation was successful.

Step 2: Set Your API Key

macOS / Linux:

export NOVITA_API_KEY=sk_your_api_key

Windows PowerShell:

$env:NOVITA_API_KEY = "sk_your_api_key"

Step 3: Launch Your Instance

novitaclaw launch

On success, the CLI returns:

  • Web UI URL — Chat with your agent
  • Gateway WebSocket URL & Token — For programmatic access
  • Web Terminal URL — Browser-based terminal access
  • File Manager URL — Manage workspace files
  • Login credentials — For Web Terminal and File Manager

Open the Web UI URL, go to the Chat tab, and start using your agent.

Configuring Models

Your instance comes pre-configured with a Novita-hosted model by default. To customize it:

Go to:
Settings → Config → Raw (JSON5 view)

Click “secrets redacted” to reveal the full configuration.

Step 1: Register a Model

Add a new entry under models.providers.novita.models:

{
  "models": {
    "providers": {
      "novita": {
        "models": [
          {
            "id": "model-id",
            "name": "display name",
            "reasoning": true,
            "input": ["text"],
            "contextWindow": 200000,
            "maxTokens": 50000
          }
        ]
      }
    }
  }
}
Step 2: Set as Primary or Fallback

Update agents.defaults:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "novita/model-id",
        "fallbacks": ["novita/fallback-model-id"]
      }
    }
  }
}

Claude Code

Claude Code is Anthropic’s official CLI agent, primarily designed for Claude models but compatible with Anthropic-API-compatible endpoints like Novita AI. It excels at whole-repository analysis, complex debugging, and agentic coding loops.

Setup:

1. Install Claude Code:

#macOS, Linux, WSL:
curl -fsSL https://claude.ai/install.sh | bash

#Windows PowerShell:
irm https://claude.ai/install.ps1 | iex

#Windows CMD:
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd
#Windows requires Git for Windows. Install it first if you don’t have it.

2. Set environment variables:

# Set the Anthropic SDK compatible API endpoint provided by Novita.
export ANTHROPIC_BASE_URL="https://api.novita.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="<Novita API Key>"
# Set the model provided by Novita.
export ANTHROPIC_MODEL="minimax/minimax-m2.5"
export ANTHROPIC_SMALL_FAST_MODEL="minimax/minimax-m2.5"

3. Start Claude Code in your project:

cd /path/to/project
claude .

Best for: Codebase exploration, multi-step debugging, autonomous feature implementation, integration with VSCode/Cursor via terminal plugins.

4. Local Deployment

MiniMax M2.5’s sparse MoE architecture (229B total, 10B active) makes local deployment viable on high-end consumer hardware or multi-GPU setups. The model requires 457GB at full BF16 precision, but quantization via Unsloth’s GGUF quantizations shrinks this to 101GB (Dynamic 3-bit) or 138GB (Q4_K_M).

Hardware Requirements

QuantizationVRAM NeededHardware Example
BF16 (full precision)457GB6× H100 80GB
Q8_0243GB4× H100 80GB
Q6_K188GB3× H100 80GB
Q4_K_M (recommended)138GB2× H100 80GB
Q3_K_M109GB2× H100 80GB
UD-IQ2_XXS (minimum)74GBSingle H100 80GB

Installation (llama.cpp)

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j"$(nproc)"

# Install HF CLI if needed
pip install -U "huggingface_hub[cli]"

# Download a specific quant (example: Q3_K_M)
hf download unsloth/MiniMax-M2.5-GGUF \
  --include "Q3_K_M/*" \
  --local-dir ./models

# Check files
find ./models -name "*.gguf"

# Run (use the FIRST shard)
./build/bin/llama-cli \
  -m ./models/Q3_K_M/MiniMax-M2.5-Q3_K_M-00001-of-00004.gguf \
  -p "Write a Python function to check if a number is prime"

Installation of Cloud GPU (Cost-Effectively)

Step 1: Register an account

Create your Novita AI account through our website. After registration, navigate to the “Explore” section in the left sidebar to view our GPU offerings and begin your AI development journey.

Novita AI website screenshot

Step 2: Exploring Templates and GPU Servers

Choose from templates like PyTorch, TensorFlow, or CUDA that match your project needs. Then select your preferred GPU configuration—options include the powerful L40S, RTX 4090 or A100 SXM4, each with different VRAM, RAM, and storage specifications.

Exploring Templates and GPU Servers

Step 3: Tailor Your Deployment

Customize your environment by selecting your preferred operating system and configuration options to ensure optimal performance for your specific AI workloads and development needs.

Tailor Your Deployment
SpecificationBilling MethodGPUPricing
H100 80 GB VRAMOn-Demand1x GPU$1.45/hr
8x GPU$11.60/hr
Spot1x GPU$0.73/hr
8x GPU$5.84/hr

Novita AI’s Spot instance is a cost-optimized GPU rental system that leverages the platform’s idle or unused GPU capacity. Unlike on-demand instances, which reserve dedicated hardware for stable, continuous usage, Spot instances are interruptible—your job may be paused or terminated if the GPU is reclaimed by the system. Because Spot mode reallocates otherwise idle GPU resources, it is typically 40–60% cheaper than on-demand pricing.

MiniMax M2.5 offers four practical access paths, each optimized for different scenarios. For most developers, Novita AI’s API at $0.30/$1.20 per million tokens provides the fastest path to production—setup takes 2 minutes with OpenAI SDK compatibility. The web playground serves first-time evaluation, while OpenClaw CLI and Claude Code enable terminal-integrated agentic workflows for power users. Self-hosting makes economic sense only above 10 million tokens per day or when strict data privacy requirements prohibit cloud APIs—in which case Q4_K_M quantization on 2× H100 80GB delivers production-ready performance.

Frequently Asked Questions

What makes MiniMax M2.5 different from other coding models?

MiniMax M2.5 uses sparse MoE architecture with 229B total parameters but only 10B active per token, achieving 80.2% on SWE-Bench Verified at 8% of Claude Sonnet 4.5’s cost.

Can I run MiniMax M2.5 on a single consumer GPU?

No—the minimum VRAM requirement is 74GB even with aggressive quantization.

Does MiniMax M2.5 support function calling and structured outputs?

Yes—MiniMax M2.5 supports function calling via OpenAI-compatible API format. 

Novita AI is an AI & agent cloud platform helping developers and startups build, deploy, and scale models and agentic applications with high performance, reliability, and cost efficiency.

Recommended Reading


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading