Kimi‑K2‑Instruct Now Available on Novita AI

kimi‑k2

Kimi‑K2‑Instruct, developed by Moonshot AI, is a next-generation sparse MoE model now accessible via Novita AI. With 1 trillion total parameters, 32 billion activated parameters, and a 128,000-token context window, it is tailored for agentic behaviors, tool use, and long-context reasoning.

Here’s the current pricing of Kimi‑K2‑Instruct on Novita AI: $0.57 / M input tokens, $2.3 / M output tokens

What is Kimi K2?

Moonshot AI (Beijing-based, founded 2023) is behind the Kimi brand, including K1.5, K2, and multimodal Kimi‑VL models. Their open science mission aims to democratize powerful, agentic intelligence

Kimi K2, developed by Moonshot AI, is a cutting-edge mixture-of-experts (MoE) language model featuring 32 billion activated parameters and a total of 1 trillion parameters. Trained using the Muon optimizer, Kimi K2 delivers outstanding performance in frontier knowledge, reasoning, and coding tasks, all while being finely tuned for advanced agentic capabilities.

Key Features

  • Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability.
  • MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up.
  • Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.

Model Variants

  • Kimi-K2-Base: The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions.
  • Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.
kimi k2 benchmark

Instruction Model Evaluation Results

BenchmarkMetricKimi K2 InstructDeepSeek-V3-0324Qwen3-235B-A22B
(non-thinking)
Claude Sonnet 4
(w/o extended thinking)
Claude Opus 4
(w/o extended thinking)
GPT-4.1Gemini 2.5 Flash
Preview (05-20)
Coding Tasks
LiveCodeBench v6
(Aug 24 – May 25)
Pass@153.746.937.048.547.444.744.7
OJBenchPass@127.124.011.315.319.619.519.5
MultiPL-EPass@185.783.178.288.689.686.785.6
SWE-bench Verified
(Agentless Coding)
Single Patch w/o Test (Acc)51.836.639.450.253.040.832.6
SWE-bench Verified
(Agentic Coding)
Single Attempt (Acc)65.838.834.472.7*72.5*54.6
Multiple Attempts (Acc)71.680.279.4*
SWE-bench Multilingual
(Agentic Coding)
Single Attempt (Acc)47.325.820.951.031.5
TerminalBenchInhouse Framework (Acc)30.035.543.28.3
Terminus (Acc)25.016.36.630.316.8
Aider-PolyglotAcc60.055.161.856.470.752.444.0
Tool Use Tasks
Tau2 retailAvg@470.669.157.075.081.874.864.3
Tau2 airlineAvg@456.539.026.555.560.054.542.5
Tau2 telecomAvg@465.832.522.145.257.038.616.9
AceBenchAcc76.572.770.576.275.680.174.5
Math & STEM Tasks
AIME 2024Avg@6469.659.4*40.1*43.448.246.561.3
AIME 2025Avg@6449.546.724.7*33.1*33.9*37.046.6
MATH-500Acc97.494.0*91.2*94.094.492.495.4
HMMT 2025Avg@3238.827.511.915.915.919.434.7
CNMO 2024Avg@1674.374.748.660.457.656.675.0
PolyMath-enAvg@465.159.551.952.849.854.049.9
ZebraLogicAcc89.084.037.7*73.759.358.557.9
AutoLogiAcc89.588.983.389.886.188.284.1
GPQA-DiamondAvg@875.168.4*62.9*70.0*74.9*66.368.2
SuperGPQAAcc57.253.750.255.756.550.849.6
Humanity’s Last Exam
(Text Only)
4.75.25.75.87.13.75.6
General Tasks
MMLUEM89.589.487.091.592.990.490.1
MMLU-ReduxEM92.790.589.293.694.292.490.6
MMLU-ProEM81.181.2*77.383.786.681.879.4
IFEvalPrompt Strict89.881.183.2*87.687.488.084.3
Multi-ChallengeAcc54.131.434.046.849.036.439.5
SimpleQACorrect31.027.713.215.922.842.323.3
LivebenchPass@176.472.467.674.874.669.867.8

Kimi-K2 Supported Engines and Minimum Hardware

Supported Engines

  • vLLM
  • SGLang
  • TensorRT-LLM
  • KTransformers

Minimum Hardware

HardwareMinimum Requirement
GPU TypeH200
Cluster Size16 GPUs(minimum)
Parallelism ModesTensor Parallelism (TP-16) or Data Parallel + Expert Parall
Weights FormatFP8 weights with 128k seqlen
Deployment examples for vLLM and SGLang can be found in the Model Deployment Guide.

How to Access Kimi‑K2‑Instruct on Novita AI

Getting started with Kimi‑K2‑Instruct is fast, simple, and affordable on Novita AI.

Use the Playground (No Coding Required)

Instant Access: Sign up, and start experimenting with Kimi‑K2‑Instruct and other top models in seconds.

Interactive UI: Experience the model through the intuitive interface.

Model Comparison: Effortlessly switch between Kimi‑K2‑Instruct and other top models to find the perfect fit for your needs.

Integrate via API (For Developers)

Seamlessly connect Kimi‑K2‑Instruct to your applications, workflows, or chatbots with Novita AI’s unified REST API–no need to manage model weights or infrastructure.

Option 1: Direct API Integration (Python Example)

To get started, simply use the code snippet below:

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="",
)

model = "moonshotai/kimi-k2-instruct"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  
  
  

Key Features:

  • Unified endpoint:/v3/openai supports OpenAI’s Chat Completions API format.
  • Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.
  • Streaming & batching: Choose your preferred response mode.

Option 2: Multi-Agent Workflows with OpenAI Agents SDK

Build advanced multimodal agent systems by integrating Novita AI with the OpenAI Agents SDK:

Plug-and-play: Use Kimi‑K2‑Instruct in any OpenAI Agents workflow.

Supports handoffs, routing, and tool use: Design agents that can analyze visual content, delegate tasks, or run functions.

Python integration: Simply point the SDK to Novita’s endpoint (https://api.novita.ai/v3/openai) and use your API key for seamless agent workflows.

Option 3: Connect Kimi‑K2‑Instruct API on Third-Party Platforms

Hugging Face: Use Kimi‑K2‑Instruct in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.

Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.

OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline , Trae, Cursor, designed for the OpenAI API standard.

Conclusion

Kimi-K2-Instruct is a powerful, open-access, 1T-parameter MoE model, pushing the frontier in coding, reasoning, and agentic AI.

Now available on Novita AI, it blends massive scale, tool-use intelligence, and long-context processing — all deployable with efficient inference infrastructure. For developers and researchers building the next generation of AI assistants, agents, and reasoning engines, Kimi-K2-Instruct offers a cutting-edge foundation that is powerful, flexible, and production-ready.

Try the Kimi-K2-Instruct Demo on Novita AI !

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading