Kimi K2.5, Moonshot AI’s flagship open-source multimodal agentic model, is now available on Novita AI. This breakthrough model unifies vision and text processing, thinking and instant modes, and multi-agent execution into a single powerful system. Built through continual pretraining on approximately 15 trillion mixed visual and text tokens, Kimi K2.5 surpasses many closed-source alternatives.
Novita AI provides fast, affordable access to Kimi K2.5 through both API integration and an intuitive playground interface.
What is Kimi K2.5?

Moonshot AI’s Flagship Multimodal Agentic Model
Kimi K2.5 is an open-source, native multimodal agentic model developed by Moonshot AI. Built atop Kimi-K2-Base through continual pretraining on approximately 15 trillion mixed visual and text tokens, the model seamlessly integrates vision and language understanding with advanced agentic capabilities.
Unlike traditional multimodal models that bolt vision capabilities onto text-only foundations, Kimi K2.5 was pre-trained on vision-language tokens from the ground up, enabling excellence in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs.
Architecture Overview
Kimi K2.5 employs a sophisticated Mixture-of-Experts (MoE) architecture:
- Total Parameters: 1 trillion
- Activated Parameters: 32 billion per token
- Number of Experts: 384 (8 selected per token)
- Context Length: 256K tokens
- Vision Encoder: MoonViT with 400M parameters
- Attention Mechanism: MLA (Multi-head Latent Attention)
This architecture enables massive context processing while maintaining computational efficiency through sparse expert activation.
Key Features and Capabilities
Dual Operating Modes: Thinking and Instant
Thinking Mode: Designed for complex reasoning with exposed reasoning content. Ideal for mathematical problems, strategic planning, and situations requiring decision-making transparency. Uses extended token budgets (up to 96K tokens) for challenging problems.
Instant Mode: Optimized for speed with faster responses without visible reasoning. Perfect for real-time applications, conversational interfaces, and tasks prioritizing immediate responses.
Developers toggle between modes using the thinking parameter, with recommended temperature 1.0 for Thinking Mode and 0.6 for Instant Mode.
Native Multimodality:
Image Understanding: The MoonViT vision encoder (400M parameters) ensures detailed visual comprehension, from document OCR to complex visual reasoning.
Video Processing: Supports video input for applications like content analysis, workflow understanding, and visual instruction following (currently experimental).
Agent Swarm
Kimi K2.5’s Agent Swarm capability transitions from single-agent to coordinated multi-agent execution, decomposing complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents.
Coding with Vision
Kimi K2.5 excels at generating code from visual specifications:
- Convert UI designs and mockups into functional code
- Understand video workflows and generate automation scripts
- Autonomously orchestrate tools for visual data processing
- Perform complex debugging by analyzing screenshots and error states
Interleaved Thinking and Multi-Step Tool Call
The model chains multiple tool calls together, maintains context across steps, and adjusts approaches based on intermediate results – essential for agentic search, data analysis pipelines, and automated research workflows.
Benchmark Performance and Results
Kimi K2.5 achieves state-of-the-art performance across multiple domains, establishing itself as a leader in agentic AI, vision understanding, and coding capabilities.
Global SOTA on Agentic Benchmarks
Kimi K2.5 demonstrates unprecedented performance on complex agentic tasks, outperforming all competitors including GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro.
| Benchmark | Kimi K2.5 | GPT-5.2 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| Humanity’s Last Exam (Full) | 50.2% | 45.5% | 43.2% | 45.8% |
| BrowseComp | 74.9% | 65.8% | 57.8% | 59.2% |
| DeepSearchQA | 77.1% | 71.3% | 76.1% | 63.2% |
Key Achievement: Kimi K2.5 sets the global state-of-the-art on Humanity’s Last Exam (HLE) full set with 50.2% and BrowseComp with 74.9%, demonstrating superior agentic reasoning and web navigation capabilities.
Open-Source SOTA on Vision Understanding
Kimi K2.5 leads among open-source models on multimodal and vision benchmarks, delivering exceptional performance on image and video understanding tasks.
Image Understanding
| Benchmark | Kimi K2.5 | GPT-5.2 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| MMMU Pro | 78.5% | 79.5% | 74.0% | 81.0% |
| MathVision | 84.2% | 83.0% | 77.1% | 86.1% |
| OmniDocBench 1.5 | 88.8% | 85.7% | 87.7% | 88.5% |
Video Understanding
| Benchmark | Kimi K2.5 | GPT-5.2 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| VideoMMMU | 86.6% | 85.9% | 84.4% | 87.6% |
| LongVideoBench | 79.8% | 76.5% | 67.2% | 77.7% |
Key Achievement: Kimi K2.5 achieves open-source SOTA on MMMU Pro (78.5%) and VideoMMMU (86.6%), excelling at complex multimodal reasoning across images and videos.
Open-Source SOTA on Coding Benchmarks
Kimi K2.5 demonstrates competitive coding performance, particularly excelling when visual understanding is combined with code generation.
| Benchmark | Kimi K2.5 | GPT-5.2 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 76.8% | 80.0% | 80.9% | 76.2% |
| SWE-bench Multilingual | 73.0% | 72.0% | 77.5% | 65.0% |
Key Achievement: Kimi K2.5 achieves open-source SOTA on SWE-bench Verified with 76.8%, demonstrating strong real-world software engineering capabilities.
Code with Taste: Aesthetic Design from Visual Inputs
Beyond traditional coding benchmarks, Kimi K2.5 excels at translating visual inputs into aesthetic, functional code. The model can turn chats, images, and videos into expressive websites with sophisticated motion design, enabling developers to rapidly prototype visually compelling interfaces from conceptual designs.
Agent Swarm (Beta): Parallel Processing at Scale
Kimi K2.5’s Agent Swarm technology enables self-directed agents working in parallel at unprecedented scale:
- Up to 100 sub-agents working simultaneously on complex tasks
- 1,500 tool calls orchestrated across parallel workflows
- 4.5× faster compared to single-agent setups on complex search and research tasks
This breakthrough architecture allows Kimi K2.5 to decompose complex problems into specialized sub-tasks, dramatically improving both speed and accuracy on enterprise-grade agentic workflows.
How to Use Kimi K2.5 on Novita AI
Use the Playground (No Coding Required)
Experiment with Kimi K2.5 instantly through Novita AI’s interactive playground. Upload images or videos, test multimodal prompts, and toggle between Thinking and Instant modes with the full 256K context window.
Integrate via API (For Developers)
from openai import OpenAI
client = OpenAI(
api_key="<Your API Key>",
base_url="https://api.novita.ai/openai"
)
response = client.chat.completions.create(
model="moonshotai/kimi-k2.5",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
max_tokens=262144,
temperature=0.7
)
print(response.choices[0].message.content)
Connect with Third-Party Platforms
Agent Frameworks: Continue, AnythingLLM, LangChain, Dify, and Langflow through official connectors.
Hugging Face Integration: Novita AI is an official inference provider for seamless ecosystem compatibility.
OpenAI-Compatible API: Works with Cline, Kilo Code, Cursor, Trae, OpenCode and Qwen Code with minimal code changes.
Anthropic-Compatible API: Integrates with Claude Code for agentic coding workflows.
Real-World Applications and Use Cases
Vibe Coding and Visual Development
Generate code from UI mockups, wireframes, or hand-drawn sketches. Interpret video workflows to create automation scripts, significantly reducing time between design and implementation.
Enterprise Agentic Search
Browse multiple websites autonomously, compare and synthesize information from different sources, verify facts by cross-referencing multiple documents, and manage context effectively even when search results exceed typical token limits. Agent Swarm mode decomposes broad queries into parallel sub-tasks, ideal for competitive intelligence, market research, and academic literature reviews.
Complex Reasoning Tasks
- Mathematical Problem Solving: Near-perfect performance on competition mathematics (
- Scientific Reasoning: Graduate-level physics, chemistry, and biology
- Strategic Planning: Multi-step decision-making with transparent reasoning
- Legal Analysis: Document review and case law research with extensive context windows
Multimodal Content Analysis
Extract and analyze information from PDFs, scanned documents, and infographics. Analyze video content for compliance, quality assurance, or moderation. Inspect product images or manufacturing footage to identify defects.
Autonomous Tool Orchestration
Data pipeline automation, research assistants that autonomously gather information and compile reports, customer support handling complex multi-step inquiries, and DevOps automation to manage infrastructure and debug issues.
Conclusion
Kimi K2.5 represents a significant leap forward in open-source multimodal AI, matching or exceeding closed-source alternatives across a wide range of benchmarks. With its native multimodality, 256K context window, dual thinking modes, and Agent Swarm technology, Kimi K2.5 is positioned as a versatile foundation for next-generation AI applications.
Ready to experience the power of Kimi K2.5? Start building with Kimi K2.5 on Novita AI today and unlock the future of open-source multimodal AI.
Novita AI is a leading AI cloud platform that provides developers with easy-to-use APIs and affordable, reliable GPU infrastructure for building and scaling AI applications.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





