KAT-Dev-32B on Novita AI: Benchmarking Open-Source Coding Power

KAT-Dev-32B on Novita AI

KAT-Dev-32B on Novita AI is setting new standards for open-source AI in software engineering. With 32B parameters and a multi-stage training process, the model balances efficiency and performance while remaining fully open for researchers and developers. On SWE-Bench Verified, it resolves 62.4% of tasks, ranking 5th among open-source models of all scales. Developed by Kwaipilot, the AI exploration team of Kuaishou, the model is designed to bring advanced code intelligence to developers worldwide.

Current pricing on Novita AI: 65,536 context window, $0.15 per 1M input tokens, $0.40 per 1M output tokens

What is KAT-Dev-32B?

KAT-Dev-32B is a 32B parameter open-source large language model designed for software engineering tasks. It was developed by Kwaipilot, Kuaishou’s AI research team exploring cutting-edge large model capabilities. Built on top of Qwen3-32B, it has been optimized for code generation, bug fixing, refactoring, testing, and deployment workflows. Released under the kwaipilot license, it is available on Hugging Face and directly accessible through the Novita AI Playground.

What Makes KAT-Dev-32B Different?

KAT-Dev-32B is distinguished by a task-focused training pipeline that strengthens agent-style reasoning and developer workflow integration. Unlike generic LLMs, it supports long multi-turn interactions, tool usage, and developer-oriented scenarios such as debugging or configuration. On Novita AI, these strengths are supported by scalable infrastructure and easy-to-use interfaces, giving users instant access to open-source coding intelligence.

How is KAT-Dev-32B Trained?

The performance of KAT-Dev-32B is the result of three carefully engineered stages of training and tuning.

Mid-Training

This stage builds foundational skills, from tool use in sandboxed environments to handling long multi-turn dialogues and understanding Git commit/PR data. It also incorporates domain-specific coding knowledge and instruction-following capabilities.

Supervised & Reinforcement Finetuning

In this stage, the model is curated with eight task types (such as bug fixing, optimization, refactoring, code understanding) and eight programming scenarios (ranging from ML/AI to security engineering). Before reinforcement learning, a reinforcement finetuning (RFT) stage adds “teacher trajectories” — expert human-engineer examples that improve stability and generalization.

Agentic RL Scaling

The final scaling phase solves efficiency challenges in RL with advanced techniques:

  • Prefix caching for faster probability computation
  • Entropy-based trajectory pruning to preserve only high-value nodes
  • SeamlessFlow architecture to decouple training from agent behavior and maximize throughput

How Does KAT-Dev-32B Perform on SWE-Bench?

KAT-Dev-32B reaches 62.4% resolution on SWE-Bench Verified, ranking 5th among open-source models of varying scales. This demonstrates that an efficiently trained 32B model can achieve real-world coding reliability comparable to much larger systems.

Performance of open-source models on SWE-Bench Verified (KAT-Dev-32B highlighted)

Getting Started with KAT-Dev-32B on Novita AI

Accessing KAT-Dev-32B through Novita AI is simple, with options for both non-technical and developer users.

Playground Access

  • Instant access: Sign up and start experimenting with KAT-Dev-32B in seconds
  • Interactive interface: Test coding prompts, debug applications, and visualize responses in real time
  • Model comparison: Compare KAT-Dev-32B against other models to evaluate suitability

The Playground is ideal for prototyping, debugging, and exploring model behaviors without any setup.

API Integration

For developers, Novita AI provides a unified REST API to integrate KAT-Dev-32B into applications.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/openai",
    api_key="",
)

model = "kwaipilot/kat-dev"
stream = True # or False
max_tokens = 32768
system_content = "Be a helpful assistant"
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

This flexible integration supports temperature, penalties, repetition control, and streaming outputs for production workflows.

Third-Party Tools

Novita AI ensures compatibility with the broader ecosystem:

  • Works with IDEs such as Cursor, Qwen Code, Codex and Cline
  • Connects with orchestration tools like LangChain, Dify, CrewAI, and Langflow
  • Provides Hugging Face inference support for ecosystem-wide deployment

Conclusion

KAT-Dev-32B on Novita AI makes advanced code intelligence accessible through open-source availability and scalable cloud infrastructure. With its three-stage training pipeline, agentic RL scaling, and strong SWE-Bench benchmark results, it stands as a reliable solution for both research and production coding tasks. Developed by Kwaipilot, Kuaishou’s AI exploration team, it combines cutting-edge research with real-world software engineering applications.

Start building smarter today — explore KAT-Dev-32B in the Novita AI Playground or integrate it directly via API to bring next-generation coding performance into your workflows.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading