Kimi‑K2‑Instruct Now Available on Novita AI

Kimi‑K2‑Instruct, developed by Moonshot AI, is a next-generation sparse MoE model now accessible via Novita AI. With 1 trillion total parameters, 32 billion activated parameters, and a 128,000-token context window, it is tailored for agentic behaviors, tool use, and long-context reasoning.

Here’s the current pricing of Kimi‑K2‑Instruct on Novita AI: $0.57 / M input tokens, $2.3 / M output tokens

Table Of Contents

What is Kimi K2?
Instruction Model Evaluation Results
Kimi-K2 Supported Engines and Minimum Hardware
How to Access Kimi‑K2‑Instruct on Novita AI
Conclusion

What is Kimi K2?

Moonshot AI (Beijing-based, founded 2023) is behind the Kimi brand, including K1.5, K2, and multimodal Kimi‑VL models. Their open science mission aims to democratize powerful, agentic intelligence

Kimi K2, developed by Moonshot AI, is a cutting-edge mixture-of-experts (MoE) language model featuring 32 billion activated parameters and a total of 1 trillion parameters. Trained using the Muon optimizer, Kimi K2 delivers outstanding performance in frontier knowledge, reasoning, and coding tasks, all while being finely tuned for advanced agentic capabilities.

Key Features

Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.
MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.
Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.

Model Variants

Kimi-K2-Base: The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions.
Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.

Instruction Model Evaluation Results

Benchmark	Metric	^{Kimi K2 Instruct}	^{DeepSeek-V3-0324}	^{Qwen3-235B-A22B (non-thinking)}	^{Claude Sonnet 4 (w/o extended thinking)}	^{Claude Opus 4 (w/o extended thinking)}	^GPT-4.1	^{Gemini 2.5 Flash Preview (05-20)}
Coding Tasks
LiveCodeBench v6 ^{(Aug 24 – May 25)}	Pass@1	53.7	46.9	37.0	48.5	47.4	44.7	44.7
OJBench	Pass@1	27.1	24.0	11.3	15.3	19.6	19.5	19.5
MultiPL-E	Pass@1	85.7	83.1	78.2	88.6	89.6	86.7	85.6
SWE-bench Verified ^{(Agentless Coding)}	Single Patch w/o Test (Acc)	51.8	36.6	39.4	50.2	53.0	40.8	32.6
SWE-bench Verified ^{(Agentic Coding)}	Single Attempt (Acc)	65.8	38.8	34.4	72.7^*	72.5^*	54.6	—
SWE-bench Verified ^{(Agentic Coding)}	Multiple Attempts (Acc)	71.6	—	—	80.2	79.4^*	—	—
SWE-bench Multilingual ^{(Agentic Coding)}	Single Attempt (Acc)	47.3	25.8	20.9	51.0	—	31.5	—
TerminalBench	Inhouse Framework (Acc)	30.0	—	—	35.5	43.2	8.3	—
TerminalBench	Terminus (Acc)	25.0	16.3	6.6	—	—	30.3	16.8
Aider-Polyglot	Acc	60.0	55.1	61.8	56.4	70.7	52.4	44.0
Tool Use Tasks
Tau2 retail	Avg@4	70.6	69.1	57.0	75.0	81.8	74.8	64.3
Tau2 airline	Avg@4	56.5	39.0	26.5	55.5	60.0	54.5	42.5
Tau2 telecom	Avg@4	65.8	32.5	22.1	45.2	57.0	38.6	16.9
AceBench	Acc	76.5	72.7	70.5	76.2	75.6	80.1	74.5
Math & STEM Tasks
AIME 2024	Avg@64	69.6	59.4^*	40.1^*	43.4	48.2	46.5	61.3
AIME 2025	Avg@64	49.5	46.7	24.7^*	33.1^*	33.9^*	37.0	46.6
MATH-500	Acc	97.4	94.0^*	91.2^*	94.0	94.4	92.4	95.4
HMMT 2025	Avg@32	38.8	27.5	11.9	15.9	15.9	19.4	34.7
CNMO 2024	Avg@16	74.3	74.7	48.6	60.4	57.6	56.6	75.0
PolyMath-en	Avg@4	65.1	59.5	51.9	52.8	49.8	54.0	49.9
ZebraLogic	Acc	89.0	84.0	37.7^*	73.7	59.3	58.5	57.9
AutoLogi	Acc	89.5	88.9	83.3	89.8	86.1	88.2	84.1
GPQA-Diamond	Avg@8	75.1	68.4^*	62.9^*	70.0^*	74.9^*	66.3	68.2
SuperGPQA	Acc	57.2	53.7	50.2	55.7	56.5	50.8	49.6
Humanity’s Last Exam ^{(Text Only)}	–	4.7	5.2	5.7	5.8	7.1	3.7	5.6
General Tasks
MMLU	EM	89.5	89.4	87.0	91.5	92.9	90.4	90.1
MMLU-Redux	EM	92.7	90.5	89.2	93.6	94.2	92.4	90.6
MMLU-Pro	EM	81.1	81.2^*	77.3	83.7	86.6	81.8	79.4
IFEval	Prompt Strict	89.8	81.1	83.2^*	87.6	87.4	88.0	84.3
Multi-Challenge	Acc	54.1	31.4	34.0	46.8	49.0	36.4	39.5
SimpleQA	Correct	31.0	27.7	13.2	15.9	22.8	42.3	23.3
Livebench	Pass@1	76.4	72.4	67.6	74.8	74.6	69.8	67.8

Kimi-K2 Supported Engines and Minimum Hardware

Supported Engines

vLLM
SGLang
TensorRT-LLM
KTransformers

Minimum Hardware

Hardware	Minimum Requirement
GPU Type	H200
Cluster Size	16 GPUs(minimum)
Parallelism Modes	Tensor Parallelism (TP-16) or Data Parallel + Expert Parall
Weights Format	FP8 weights with 128k seqlen

Deployment examples for vLLM and SGLang can be found in the Model Deployment Guide.

How to Access Kimi‑K2‑Instruct on Novita AI

Getting started with Kimi‑K2‑Instruct is fast, simple, and affordable on Novita AI.

Use the Playground (No Coding Required)

Instant Access: Sign up, and start experimenting with Kimi‑K2‑Instruct and other top models in seconds.

Interactive UI: Experience the model through the intuitive interface.

Model Comparison: Effortlessly switch between Kimi‑K2‑Instruct and other top models to find the perfect fit for your needs.

Explore Kimi-K2-Instruct Demo Now

Integrate via API (For Developers)

Seamlessly connect Kimi‑K2‑Instruct to your applications, workflows, or chatbots with Novita AI’s unified REST API–no need to manage model weights or infrastructure.

Option 1: Direct API Integration (Python Example)

To get started, simply use the code snippet below:

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="",
)

model = "moonshotai/kimi-k2-instruct"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Key Features:

Unified endpoint:/v3/openai supports OpenAI’s Chat Completions API format.
Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.
Streaming & batching: Choose your preferred response mode.

Option 2: Multi-Agent Workflows with OpenAI Agents SDK

Build advanced multimodal agent systems by integrating Novita AI with the OpenAI Agents SDK:

Plug-and-play: Use Kimi‑K2‑Instruct in any OpenAI Agents workflow.

Supports handoffs, routing, and tool use: Design agents that can analyze visual content, delegate tasks, or run functions.

Python integration: Simply point the SDK to Novita’s endpoint (https://api.novita.ai/v3/openai) and use your API key for seamless agent workflows.

Option 3: Connect Kimi‑K2‑Instruct API on Third-Party Platforms

Hugging Face: Use Kimi‑K2‑Instruct in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.

Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.

OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline , Trae, Cursor, designed for the OpenAI API standard.

Conclusion

Kimi-K2-Instruct is a powerful, open-access, 1T-parameter MoE model, pushing the frontier in coding, reasoning, and agentic AI.

Now available on Novita AI, it blends massive scale, tool-use intelligence, and long-context processing — all deployable with efficient inference infrastructure. For developers and researchers building the next generation of AI assistants, agents, and reasoning engines, Kimi-K2-Instruct offers a cutting-edge foundation that is powerful, flexible, and production-ready.

Try the Kimi-K2-Instruct Demo on Novita AI !

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Kimi‑K2‑Instruct Now Available on Novita AI