By
Novita AI
/ March 30, 2026 / LLM / 7 minutes of reading
MiniMax M2.5 is a 229-billion parameter sparse Mixture-of-Experts model, enabling efficient inference despite its scale. Released by Chinese AI company MiniMax, it ranks among top open-source models for autonomous coding and web navigation tasks, achieving 80.2% on SWE-Bench Verified and 76.3% on BrowseComp.
Novita offers an accelerated model that maintains the strong performance of the previous version while significantly improving speed.
The fastest zero-barrier entry point is Novita AI’s web playground—no signup, no API keys, instant evaluation. This works best for quick capability testing before committing to API integration or local deployment.
Typical use cases: Prompt engineering, quality evaluation, coding task testing, comparing outputs with other models side-by-side. Web playground is ideal for first-time evaluation and one-off tasks—no technical setup required.
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.
Step 5: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
from openai import OpenAI
client = OpenAI(
api_key="<Your API Key>",
base_url="https://api.novita.ai/openai"
)
response = client.chat.completions.create(
model="minimax/minimax-m2.5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
max_tokens=131100,
temperature=0.7
)
print(response.choices[0].message.content)
3. Code Tool Access
NovitaClaw
NovitaClaw is a command-line tool for deploying and managing persistent OpenClaw agents on the Novita Agent Sandbox. With a single command, you can launch a fully hosted agent instance that runs continuously—no session limits or manual restarts required. Once deployed, the agent can be accessed and controlled through multiple interfaces, including the CLI, a web-based UI, or external automation scripts.
Getting Started
Prerequisites
Before you begin, make sure you have:
Python installed
A Novita API key (create or manage keys in Key Management)
Step 1: Install NovitaClaw
macOS / Linux:
sudo pip3 install novitaclaw
Windows PowerShell:
pip install novitaclaw
Verify: run novitaclaw --help. If you see a list of commands, installation was successful.
Step 2: Set Your API Key
macOS / Linux:
export NOVITA_API_KEY=sk_your_api_key
Windows PowerShell:
$env:NOVITA_API_KEY = "sk_your_api_key"
Step 3: Launch Your Instance
novitaclaw launch
On success, the CLI returns:
Web UI URL — Chat with your agent
Gateway WebSocket URL & Token — For programmatic access
Web Terminal URL — Browser-based terminal access
File Manager URL — Manage workspace files
Login credentials — For Web Terminal and File Manager
Open the Web UI URL, go to the Chat tab, and start using your agent.
Configuring Models
Your instance comes pre-configured with a Novita-hosted model by default. To customize it:
Go to: Settings → Config → Raw (JSON5 view)
Click “secrets redacted” to reveal the full configuration.
Step 1: Register a Model
Add a new entry under models.providers.novita.models:
Claude Code is Anthropic’s official CLI agent, primarily designed for Claude models but compatible with Anthropic-API-compatible endpoints like Novita AI. It excels at whole-repository analysis, complex debugging, and agentic coding loops.
Setup:
1. Install Claude Code:
#macOS, Linux, WSL:
curl -fsSL https://claude.ai/install.sh | bash
#Windows PowerShell:
irm https://claude.ai/install.ps1 | iex
#Windows CMD:
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd
#Windows requires Git for Windows. Install it first if you don’t have it.
2. Set environment variables:
# Set the Anthropic SDK compatible API endpoint provided by Novita.
export ANTHROPIC_BASE_URL="https://api.novita.ai/anthropic"
export ANTHROPIC_AUTH_TOKEN="<Novita API Key>"
# Set the model provided by Novita.
export ANTHROPIC_MODEL="minimax/minimax-m2.5"
export ANTHROPIC_SMALL_FAST_MODEL="minimax/minimax-m2.5"
3. Start Claude Code in your project:
cd /path/to/project
claude .
Best for: Codebase exploration, multi-step debugging, autonomous feature implementation, integration with VSCode/Cursor via terminal plugins.
4. Local Deployment
MiniMax M2.5’s sparse MoE architecture (229B total, 10B active) makes local deployment viable on high-end consumer hardware or multi-GPU setups. The model requires 457GB at full BF16 precision, but quantization via Unsloth’s GGUF quantizations shrinks this to 101GB (Dynamic 3-bit) or 138GB (Q4_K_M).
Hardware Requirements
Quantization
VRAM Needed
Hardware Example
BF16 (full precision)
457GB
6× H100 80GB
Q8_0
243GB
4× H100 80GB
Q6_K
188GB
3× H100 80GB
Q4_K_M (recommended)
138GB
2× H100 80GB
Q3_K_M
109GB
2× H100 80GB
UD-IQ2_XXS (minimum)
74GB
Single H100 80GB
Installation (llama.cpp)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j"$(nproc)"
# Install HF CLI if needed
pip install -U "huggingface_hub[cli]"
# Download a specific quant (example: Q3_K_M)
hf download unsloth/MiniMax-M2.5-GGUF \
--include "Q3_K_M/*" \
--local-dir ./models
# Check files
find ./models -name "*.gguf"
# Run (use the FIRST shard)
./build/bin/llama-cli \
-m ./models/Q3_K_M/MiniMax-M2.5-Q3_K_M-00001-of-00004.gguf \
-p "Write a Python function to check if a number is prime"
Installation of Cloud GPU (Cost-Effectively)
Step 1: Register an account
Create your Novita AI account through our website. After registration, navigate to the “Explore” section in the left sidebar to view our GPU offerings and begin your AI development journey.
Step 2: Exploring Templates and GPU Servers
Choose from templates like PyTorch, TensorFlow, or CUDA that match your project needs. Then select your preferred GPU configuration—options include the powerful L40S, RTX 4090 or A100 SXM4, each with different VRAM, RAM, and storage specifications.
Step 3: Tailor Your Deployment
Customize your environment by selecting your preferred operating system and configuration options to ensure optimal performance for your specific AI workloads and development needs.
Novita AI’s Spot instance is a cost-optimized GPU rental system that leverages the platform’s idle or unused GPU capacity. Unlike on-demand instances, which reserve dedicated hardware for stable, continuous usage, Spot instances are interruptible—your job may be paused or terminated if the GPU is reclaimed by the system. Because Spot mode reallocates otherwise idle GPU resources, it is typically 40–60% cheaper than on-demand pricing.
MiniMax M2.5 offers four practical access paths, each optimized for different scenarios. For most developers, Novita AI’s API at $0.30/$1.20 per million tokens provides the fastest path to production—setup takes 2 minutes with OpenAI SDK compatibility. The web playground serves first-time evaluation, while OpenClaw CLI and Claude Code enable terminal-integrated agentic workflows for power users. Self-hosting makes economic sense only above 10 million tokens per day or when strict data privacy requirements prohibit cloud APIs—in which case Q4_K_M quantization on 2× H100 80GB delivers production-ready performance.
Frequently Asked Questions
What makes MiniMax M2.5 different from other coding models?
MiniMax M2.5 uses sparse MoE architecture with 229B total parameters but only 10B active per token, achieving 80.2% on SWE-Bench Verified at 8% of Claude Sonnet 4.5’s cost.
Can I run MiniMax M2.5 on a single consumer GPU?
No—the minimum VRAM requirement is 74GB even with aggressive quantization.
Does MiniMax M2.5 support function calling and structured outputs?
Yes—MiniMax M2.5 supports function calling via OpenAI-compatible API format.
Novita AI is an AI & agent cloud platform helping developers and startups build, deploy, and scale models and agentic applications with high performance, reliability, and cost efficiency.