Kimi K2.5: Multimodal AI for Vision, Code and Agent

Kimi K2.5, Moonshot AI’s flagship open-source multimodal agentic model, is now available on Novita AI. This breakthrough model unifies vision and text processing, thinking and instant modes, and multi-agent execution into a single powerful system. Built through continual pretraining on approximately 15 trillion mixed visual and text tokens, Kimi K2.5 surpasses many closed-source alternatives.

Novita AI provides fast, affordable access to Kimi K2.5 through both API integration and an intuitive playground interface.

Try Kimi K2.5 Demo Now

Table Of Contents

What is Kimi K2.5?
Key Features and Capabilities
Benchmark Performance and Results
How to Use Kimi K2.5 on Novita AI
Real-World Applications and Use Cases
Conclusion

What is Kimi K2.5?

Artificial Analysis Intelligence Index of Kimi K2 — Source from: Artificial Analysis

Moonshot AI’s Flagship Multimodal Agentic Model

Kimi K2.5 is an open-source, native multimodal agentic model developed by Moonshot AI. Built atop Kimi-K2-Base through continual pretraining on approximately 15 trillion mixed visual and text tokens, the model seamlessly integrates vision and language understanding with advanced agentic capabilities.

Unlike traditional multimodal models that bolt vision capabilities onto text-only foundations, Kimi K2.5 was pre-trained on vision-language tokens from the ground up, enabling excellence in visual knowledge, cross-modal reasoning, and agentic tool use grounded in visual inputs.

Architecture Overview

Kimi K2.5 employs a sophisticated Mixture-of-Experts (MoE) architecture:

Total Parameters: 1 trillion
Activated Parameters: 32 billion per token
Number of Experts: 384 (8 selected per token)
Context Length: 256K tokens
Vision Encoder: MoonViT with 400M parameters
Attention Mechanism: MLA (Multi-head Latent Attention)

This architecture enables massive context processing while maintaining computational efficiency through sparse expert activation.

Key Features and Capabilities

Dual Operating Modes: Thinking and Instant

Thinking Mode: Designed for complex reasoning with exposed reasoning content. Ideal for mathematical problems, strategic planning, and situations requiring decision-making transparency. Uses extended token budgets (up to 96K tokens) for challenging problems.

Instant Mode: Optimized for speed with faster responses without visible reasoning. Perfect for real-time applications, conversational interfaces, and tasks prioritizing immediate responses.

Developers toggle between modes using the thinking parameter, with recommended temperature 1.0 for Thinking Mode and 0.6 for Instant Mode.

Native Multimodality:

Image Understanding: The MoonViT vision encoder (400M parameters) ensures detailed visual comprehension, from document OCR to complex visual reasoning.

Video Processing: Supports video input for applications like content analysis, workflow understanding, and visual instruction following (currently experimental).

Agent Swarm

Kimi K2.5’s Agent Swarm capability transitions from single-agent to coordinated multi-agent execution, decomposing complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents.

Coding with Vision

Kimi K2.5 excels at generating code from visual specifications:

Convert UI designs and mockups into functional code
Understand video workflows and generate automation scripts
Autonomously orchestrate tools for visual data processing
Perform complex debugging by analyzing screenshots and error states

Interleaved Thinking and Multi-Step Tool Call

The model chains multiple tool calls together, maintains context across steps, and adjusts approaches based on intermediate results – essential for agentic search, data analysis pipelines, and automated research workflows.

Benchmark Performance and Results

Kimi K2.5 achieves state-of-the-art performance across multiple domains, establishing itself as a leader in agentic AI, vision understanding, and coding capabilities.

Global SOTA on Agentic Benchmarks

Kimi K2.5 demonstrates unprecedented performance on complex agentic tasks, outperforming all competitors including GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro.

Benchmark	Kimi K2.5	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
Humanity’s Last Exam (Full)	50.2%	45.5%	43.2%	45.8%
BrowseComp	74.9%	65.8%	57.8%	59.2%
DeepSearchQA	77.1%	71.3%	76.1%	63.2%

Key Achievement: Kimi K2.5 sets the global state-of-the-art on Humanity’s Last Exam (HLE) full set with 50.2% and BrowseComp with 74.9%, demonstrating superior agentic reasoning and web navigation capabilities.

Open-Source SOTA on Vision Understanding

Kimi K2.5 leads among open-source models on multimodal and vision benchmarks, delivering exceptional performance on image and video understanding tasks.

Image Understanding

Benchmark	Kimi K2.5	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
MMMU Pro	78.5%	79.5%	74.0%	81.0%
MathVision	84.2%	83.0%	77.1%	86.1%
OmniDocBench 1.5	88.8%	85.7%	87.7%	88.5%

Video Understanding

Benchmark	Kimi K2.5	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
VideoMMMU	86.6%	85.9%	84.4%	87.6%
LongVideoBench	79.8%	76.5%	67.2%	77.7%

Key Achievement: Kimi K2.5 achieves open-source SOTA on MMMU Pro (78.5%) and VideoMMMU (86.6%), excelling at complex multimodal reasoning across images and videos.

Open-Source SOTA on Coding Benchmarks

Kimi K2.5 demonstrates competitive coding performance, particularly excelling when visual understanding is combined with code generation.

Benchmark	Kimi K2.5	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
SWE-bench Verified	76.8%	80.0%	80.9%	76.2%
SWE-bench Multilingual	73.0%	72.0%	77.5%	65.0%

Key Achievement: Kimi K2.5 achieves open-source SOTA on SWE-bench Verified with 76.8%, demonstrating strong real-world software engineering capabilities.

Code with Taste: Aesthetic Design from Visual Inputs

Beyond traditional coding benchmarks, Kimi K2.5 excels at translating visual inputs into aesthetic, functional code. The model can turn chats, images, and videos into expressive websites with sophisticated motion design, enabling developers to rapidly prototype visually compelling interfaces from conceptual designs.

Agent Swarm (Beta): Parallel Processing at Scale

Kimi K2.5’s Agent Swarm technology enables self-directed agents working in parallel at unprecedented scale:

Up to 100 sub-agents working simultaneously on complex tasks
1,500 tool calls orchestrated across parallel workflows
4.5× faster compared to single-agent setups on complex search and research tasks

This breakthrough architecture allows Kimi K2.5 to decompose complex problems into specialized sub-tasks, dramatically improving both speed and accuracy on enterprise-grade agentic workflows.

How to Use Kimi K2.5 on Novita AI

Use the Playground (No Coding Required)

Experiment with Kimi K2.5 instantly through Novita AI’s interactive playground. Upload images or videos, test multimodal prompts, and toggle between Thinking and Instant modes with the full 256K context window.

Integrate via API (For Developers)

from openai import OpenAI

client = OpenAI(
    api_key="<Your API Key>",
    base_url="https://api.novita.ai/openai"
)

response = client.chat.completions.create(
    model="moonshotai/kimi-k2.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ],
    max_tokens=262144,
    temperature=0.7
)

print(response.choices[0].message.content)

Connect with Third-Party Platforms

Agent Frameworks: Continue, AnythingLLM, LangChain, Dify, and Langflow through official connectors.

Hugging Face Integration: Novita AI is an official inference provider for seamless ecosystem compatibility.

OpenAI-Compatible API: Works with Cline, Kilo Code, Cursor, Trae, OpenCode and Qwen Code with minimal code changes.

Anthropic-Compatible API: Integrates with Claude Code for agentic coding workflows.

Real-World Applications and Use Cases

Vibe Coding and Visual Development

Generate code from UI mockups, wireframes, or hand-drawn sketches. Interpret video workflows to create automation scripts, significantly reducing time between design and implementation.

Enterprise Agentic Search

Browse multiple websites autonomously, compare and synthesize information from different sources, verify facts by cross-referencing multiple documents, and manage context effectively even when search results exceed typical token limits. Agent Swarm mode decomposes broad queries into parallel sub-tasks, ideal for competitive intelligence, market research, and academic literature reviews.

Complex Reasoning Tasks

Mathematical Problem Solving: Near-perfect performance on competition mathematics (
Scientific Reasoning: Graduate-level physics, chemistry, and biology
Strategic Planning: Multi-step decision-making with transparent reasoning
Legal Analysis: Document review and case law research with extensive context windows

Multimodal Content Analysis

Extract and analyze information from PDFs, scanned documents, and infographics. Analyze video content for compliance, quality assurance, or moderation. Inspect product images or manufacturing footage to identify defects.

Autonomous Tool Orchestration

Data pipeline automation, research assistants that autonomously gather information and compile reports, customer support handling complex multi-step inquiries, and DevOps automation to manage infrastructure and debug issues.

Conclusion

Kimi K2.5 represents a significant leap forward in open-source multimodal AI, matching or exceeding closed-source alternatives across a wide range of benchmarks. With its native multimodality, 256K context window, dual thinking modes, and Agent Swarm technology, Kimi K2.5 is positioned as a versatile foundation for next-generation AI applications.

Ready to experience the power of Kimi K2.5? Start building with Kimi K2.5 on Novita AI today and unlock the future of open-source multimodal AI.

Novita AI is a leading AI cloud platform that provides developers with easy-to-use APIs and affordable, reliable GPU infrastructure for building and scaling AI applications.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Kimi K2.5 Now on Novita AI: Multimodal AI for Vision, Code and Agent

What is Kimi K2.5?