MiniMax M2.5 Access Guide: Web, API, CLI, Self-Host 2026

MiniMax M2.5 is a 229-billion parameter sparse Mixture-of-Experts model, enabling efficient inference despite its scale. Released by Chinese AI company MiniMax, it ranks among top open-source models for autonomous coding and web navigation tasks, achieving 80.2% on SWE-Bench Verified and 76.3% on BrowseComp.

Novita offers an accelerated model that maintains the strong performance of the previous version while significantly improving speed.

Try MiniMax M2.5 Highspeed Now!

Table Of Contents

Access Methods Comparison
1. Web Playground
2. Novita AI API (Recommended for Most Developers)
3. Code Tool Access
4. Local Deployment

Access Methods Comparison

Method	Setup Time	Cost (1M tokens/day)	Best For
Web Playground	0 minutes	Free (rate-limited)	First-time evaluation, one-off tasks
Novita AI API	2 minutes	Input: $0.3 /Mt Cache Read: $0.03 /Mt Output: $1.2 /Mt	Production apps, moderate volume, rapid prototyping
NovitaClaw	5 minutes	Input: $0.3 /Mt Cache Read: $0.03 /Mt Output: $1.2 /Mt	Terminal automation, DevOps workflows
Claude Code	5 minutes	Input: $0.3 /Mt Cache Read: $0.03 /Mt Output: $1.2 /Mt	Codebase exploration, IDE integration
Local (Q4_K_M)	30-60 minutes	One-time investment: $60,000–$90,000	High-volume production, data privacy requirements
Cloud GPU	5 minutes	8x GPU $11.60/hr	Short-term experiments, burst workloads, large-model testing

1. Web Playground

The fastest zero-barrier entry point is Novita AI’s web playground—no signup, no API keys, instant evaluation. This works best for quick capability testing before committing to API integration or local deployment.

Typical use cases: Prompt engineering, quality evaluation, coding task testing, comparing outputs with other models side-by-side. Web playground is ideal for first-time evaluation and one-off tasks—no technical setup required.

Try MiniMax M2.5 Now!

2. Novita AI API (Recommended for Most Developers)

Why Choose Novita AI API?

OpenAI-compatible and Anthropic-compatible
Competitive pricing: $0.30/$1.20 per 1M tokens.
Support Cache Pricing: Cache pricing allows you to reuse previously saved prompts, helping reduce repeated computation and lower overall costs.

Setting up Guide

Step 1: Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Novita AI provides OpenAI-compatible endpoints for MiniMax M2.5

Try Affordable MiniMax M2.5 Now!

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 5: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI

client = OpenAI(
    api_key="<Your API Key>",
    base_url="https://api.novita.ai/openai"
)

response = client.chat.completions.create(
    model="minimax/minimax-m2.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ],
    max_tokens=131100,
    temperature=0.7
)

print(response.choices[0].message.content)

3. Code Tool Access

NovitaClaw

NovitaClaw is a command-line tool for deploying and managing persistent OpenClaw agents on the Novita Agent Sandbox. With a single command, you can launch a fully hosted agent instance that runs continuously—no session limits or manual restarts required. Once deployed, the agent can be accessed and controlled through multiple interfaces, including the CLI, a web-based UI, or external automation scripts.

Getting Started

Prerequisites

Before you begin, make sure you have:

Python installed
A Novita API key (create or manage keys in Key Management)

Step 1: Install NovitaClaw

macOS / Linux:

sudo pip3 install novitaclaw

Windows PowerShell:

pip install novitaclaw

Verify: run novitaclaw --help. If you see a list of commands, installation was successful.

Step 2: Set Your API Key

macOS / Linux:

export NOVITA_API_KEY=sk_your_api_key

Windows PowerShell:

$env:NOVITA_API_KEY = "sk_your_api_key"

Step 3: Launch Your Instance

novitaclaw launch

On success, the CLI returns:

Web UI URL — Chat with your agent
Gateway WebSocket URL & Token — For programmatic access
Web Terminal URL — Browser-based terminal access
File Manager URL — Manage workspace files
Login credentials — For Web Terminal and File Manager

Open the Web UI URL, go to the Chat tab, and start using your agent.

Configuring Models

Your instance comes pre-configured with a Novita-hosted model by default. To customize it:

Go to:
Settings → Config → Raw (JSON5 view)

Click “secrets redacted” to reveal the full configuration.

Step 1: Register a Model

Add a new entry under models.providers.novita.models:

{
  "models": {
    "providers": {
      "novita": {
        "models": [
          {
            "id": "model-id",
            "name": "display name",
            "reasoning": true,
            "input": ["text"],
            "contextWindow": 200000,
            "maxTokens": 50000
          }
        ]
      }
    }
  }
}

Step 2: Set as Primary or Fallback

Update agents.defaults:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "novita/model-id",
        "fallbacks": ["novita/fallback-model-id"]
      }
    }
  }
}

Claude Code

Claude Code is Anthropic’s official CLI agent, primarily designed for Claude models but compatible with Anthropic-API-compatible endpoints like Novita AI. It excels at whole-repository analysis, complex debugging, and agentic coding loops.

Setup:

1. Install Claude Code:

#macOS, Linux, WSL:
curl -fsSL https://claude.ai/install.sh | bash

#Windows PowerShell:
irm https://claude.ai/install.ps1 | iex

#Windows CMD:
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd
#Windows requires Git for Windows. Install it first if you don’t have it.

2. Set environment variables:

# Set the Anthropic SDK compatible API endpoint provided by Novita.
export ANTHROPIC_BASE_URL="https://api.novita.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="<Novita API Key>"
# Set the model provided by Novita.
export ANTHROPIC_MODEL="minimax/minimax-m2.5"
export ANTHROPIC_SMALL_FAST_MODEL="minimax/minimax-m2.5"

3. Start Claude Code in your project:

cd /path/to/project
claude .

Best for: Codebase exploration, multi-step debugging, autonomous feature implementation, integration with VSCode/Cursor via terminal plugins.

4. Local Deployment

MiniMax M2.5’s sparse MoE architecture (229B total, 10B active) makes local deployment viable on high-end consumer hardware or multi-GPU setups. The model requires 457GB at full BF16 precision, but quantization via Unsloth’s GGUF quantizations shrinks this to 101GB (Dynamic 3-bit) or 138GB (Q4_K_M).

Hardware Requirements

Quantization	VRAM Needed	Hardware Example
BF16 (full precision)	457GB	6× H100 80GB
Q8_0	243GB	4× H100 80GB
Q6_K	188GB	3× H100 80GB
Q4_K_M (recommended)	138GB	2× H100 80GB
Q3_K_M	109GB	2× H100 80GB
UD-IQ2_XXS (minimum)	74GB	Single H100 80GB

Installation (llama.cpp)

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j"$(nproc)"

# Install HF CLI if needed
pip install -U "huggingface_hub[cli]"

# Download a specific quant (example: Q3_K_M)
hf download unsloth/MiniMax-M2.5-GGUF \
  --include "Q3_K_M/*" \
  --local-dir ./models

# Check files
find ./models -name "*.gguf"

# Run (use the FIRST shard)
./build/bin/llama-cli \
  -m ./models/Q3_K_M/MiniMax-M2.5-Q3_K_M-00001-of-00004.gguf \
  -p "Write a Python function to check if a number is prime"

Installation of Cloud GPU (Cost-Effectively)

Step 1: Register an account

Create your Novita AI account through our website. After registration, navigate to the “Explore” section in the left sidebar to view our GPU offerings and begin your AI development journey.

Step 2: Exploring Templates and GPU Servers

Choose from templates like PyTorch, TensorFlow, or CUDA that match your project needs. Then select your preferred GPU configuration—options include the powerful L40S, RTX 4090 or A100 SXM4, each with different VRAM, RAM, and storage specifications.

Step 3: Tailor Your Deployment

Customize your environment by selecting your preferred operating system and configuration options to ensure optimal performance for your specific AI workloads and development needs.

Specification	Billing Method	GPU	Pricing
H100 80 GB VRAM	On-Demand	1x GPU	$1.45/hr
	On-Demand	8x GPU	$11.60/hr
	Spot	1x GPU	$0.73/hr
	Spot	8x GPU	$5.84/hr

Try Cost-effective GPU!

Novita AI’s Spot instance is a cost-optimized GPU rental system that leverages the platform’s idle or unused GPU capacity. Unlike on-demand instances, which reserve dedicated hardware for stable, continuous usage, Spot instances are interruptible—your job may be paused or terminated if the GPU is reclaimed by the system. Because Spot mode reallocates otherwise idle GPU resources, it is typically 40–60% cheaper than on-demand pricing.

MiniMax M2.5 offers four practical access paths, each optimized for different scenarios. For most developers, Novita AI’s API at $0.30/$1.20 per million tokens provides the fastest path to production—setup takes 2 minutes with OpenAI SDK compatibility. The web playground serves first-time evaluation, while OpenClaw CLI and Claude Code enable terminal-integrated agentic workflows for power users. Self-hosting makes economic sense only above 10 million tokens per day or when strict data privacy requirements prohibit cloud APIs—in which case Q4_K_M quantization on 2× H100 80GB delivers production-ready performance.

Frequently Asked Questions

What makes MiniMax M2.5 different from other coding models?

MiniMax M2.5 uses sparse MoE architecture with 229B total parameters but only 10B active per token, achieving 80.2% on SWE-Bench Verified at 8% of Claude Sonnet 4.5’s cost.

Can I run MiniMax M2.5 on a single consumer GPU?

No—the minimum VRAM requirement is 74GB even with aggressive quantization.

Does MiniMax M2.5 support function calling and structured outputs?

Yes—MiniMax M2.5 supports function calling via OpenAI-compatible API format.

Novita AI is an AI & agent cloud platform helping developers and startups build, deploy, and scale models and agentic applications with high performance, reliability, and cost efficiency.

Recommended Reading

Discover more from Novita

Subscribe to get the latest posts sent to your email.

MiniMax M2.5 Access Guide: Web, API, CLI, Self-Host 2026

Access Methods Comparison

1. Web Playground

2. Novita AI API (Recommended for Most Developers)

Why Choose Novita AI API?

Setting up Guide

3. Code Tool Access

NovitaClaw

Getting Started

Step 1: Install NovitaClaw

Step 2: Set Your API Key

Step 3: Launch Your Instance

Configuring Models

Step 1: Register a Model

Step 2: Set as Primary or Fallback

Claude Code

4. Local Deployment

Hardware Requirements

Installation (llama.cpp)

Installation of Cloud GPU (Cost-Effectively)

Discover more from Novita

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Access Methods Comparison

1. Web Playground

2. Novita AI API (Recommended for Most Developers)

Why Choose Novita AI API?

Setting up Guide

3. Code Tool Access

NovitaClaw

Getting Started

Step 1: Install NovitaClaw

Step 2: Set Your API Key

Step 3: Launch Your Instance

Configuring Models

Step 1: Register a Model

Step 2: Set as Primary or Fallback

Claude Code

4. Local Deployment

Hardware Requirements

Installation (llama.cpp)

Installation of Cloud GPU (Cost-Effectively)

Discover more from Novita

Related Posts

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Discover more from Novita