Novita AI Partners with Z.ai to Offer GLM-4.5: Unifying Reasoning, Coding, and Agentic AI Capabilities

GLM-4.5 on Novita AI

Today, we’re excited to announce Novita AI’s partnership with Z.ai to bring day-one support for GLM-4.5 on the Novita AI platform as a Z.ai launch partner. This groundbreaking collaboration introduces the world’s most unified AI model series, combining advanced reasoning, sophisticated coding capabilities, and native agentic functionality in a single, powerful framework designed for developers building the next generation of AI applications.

Novita AI is now offering the groundbreaking GLM-4.5 model (355B total params, 32B active): Built with hybrid reasoning modes offering both thinking mode for complex reasoning and tool usage, and non-thinking mode for instant responses. Ranks 2nd place overall across comprehensive benchmarks.

Both models feature 128k context length and native function calling capacity, available through Novita AI’s optimized inference infrastructure.

⚡ Overall Performance

GLM-4.5 ranks 2nd place and GLM-4.5-Air ranks 5th across 12 benchmarks covering agentic (3), reasoning (7), and coding (2) tasks, compared against models from OpenAI, Anthropic, Google DeepMind, xAI, Alibaba, Moonshot, and DeepSeek.

GLM-4.5 unifies all capabilities where previous models excelled in specific areas—coding, math, or reasoning—but none achieved best performance across all tasks.

Agentic Tasks

GLM-4.5 is a foundation model optimized for agentic tasks. It provides 128k context length and native function calling capacity. Z.ai measured its agent ability on τ-bench and BFCL-v3 (Berkeley Function Calling Leaderboard v3). On both benchmarks, GLM-4.5 matches the performance of Claude-4-Sonnet.

Web browsing is a popular agentic application that requires complex reasoning and multi-turn tool using. Z.ai evaluated GLM-4.5 on the BrowseComp benchmark, a challenging benchmark for web browsing that consists of complicated questions that expect short answers. With access to the web browsing tool, GLM-4.5 gives correct answers for 26.4% of all questions, clearly outperforming Claude-4-Opus (18.8%) and close to o4-mini-high (28.0%).

BenchmarkGLM-4.5GLM-4.5-Airo3o4-mini-highGPT-4.1Claude 4 OpusClaude 4 SonnetGemini 2.5 ProQwen3 235B Thinking 2507DeepSeek R1 0528Kimi K2Grok4
TAU-bench70.169.461.257.462.070.570.362.573.258.762.667.5
BFCL v3 (Full)77.876.472.467.268.961.875.261.272.463.871.166.2
BrowseComp26.421.349.728.34.118.814.77.64.63.27.932.6

Reasoning

Under the thinking mode, GLM-4.5 and GLM-4.5-Air can solve complex reasoning problems including mathematics, science, and logical problems.

BenchmarkGLM-4.5GLM-4.5-Airo3o4-mini-highClaude 4 OpusClaude 4 SonnetGemini 2.5 ProGemini 2.5 FlashDeepSeek R1 0528Qwen3-235B Thinking 2507Grok4
MMLU Pro84.681.485.383.287.384.286.283.284.984.586.6
AIME2491.089.490.394.075.777.388.782.389.394.194.3
MATH 50098.298.199.298.998.299.196.798.198.398.099.0
SciCode41.737.341.046.539.840.042.839.440.342.945.7
GPQA79.175.082.778.479.677.784.479.081.381.187.7
HLE14.410.620.017.511.78.521.111.114.915.823.9
LiveCodeBench (2407-2501)72.970.778.480.463.658.080.169.577.078.281.9
AA-Index (Estimated)67.764.870.069.864.462.770.565.168.369.473.2

Coding

GLM-4.5 is also good at coding, including both building a coding project from scratch, and agentically solving coding tasks in existing projects. It can be seamlessly combined with existing coding toolkits such as Claude Code, Roo Code, and CodeGeex. To evaluate the coding capability, Z.ai compared different models on SWE-bench Verified and Terminal Bench.

BenchmarkGLM-4.5GLM-4.5-Airo3o4-mini-highGPT-4.1Claude 4 OpusClaude 4 SonnetGemini 2.5 ProGemini 2.5 FlashQwen3 235B Thinking 2507Qwen3 235BDeepSeek R1 0528Kimi K2
SWE-bench Verified64.257.669.154.848.667.870.449.060.435.036.241.465.4
Terminal-Bench37.530.030.218.530.343.235.525.316.86.36.617.525.0

To evaluate GLM-4.5’s agentic coding capabilities in real-world scenarios, Z.ai used Claude Code to conduct comprehensive testing against Claude-4-Sonnet, Kimi K2, and Qwen3-Coder using 52 coding tasks covering frontend development, tool development, data analysis, testing, and algorithm applications. GLM-4.5 wins against Kimi K2 in 53.9% of the tasks and dominates Qwen3-Coder with an 80.8% win rate, while showing room for improvement against Claude-4-Sonnet.

GLM-4.5's Experience with Agentic Coding in Real-world Development Scenarios

Notably, GLM-4.5 achieves the highest average tool calling success rate at 90.6%, outperforming Claude-4-Sonnet (89.5%), Kimi-K2 (86.2%), and Qwen3-Coder (77.1%), demonstrating superior reliability and efficiency in agentic coding tasks.

Average Tool Calling Success Rate Comparison

🚀 Get Started with Novita AI

Use the Playground (No Coding Required)

  • Instant Access: Sign up and start experimenting with GLM-4.5 in seconds
  • Interactive Interface: Test complex reasoning prompts and visualize structured outputs in real-time
  • Model Comparison: Compare GLM-4.5 with other leading models for your specific use case

Integrate via API (For Developers)

Connect GLM-4.5 to your applications with Novita AI’s unified REST API.

Option 1: Direct API Integration (Python Example)

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="session_S4q9KTdBQujFkXSE5aZYZCrwN9f5QO96BtAFLw4FOgB__slLHW9KFAjmMgC12ag6mf2lJ1rASEvHbP_gv7Jh2Q==",
)

model = "zai-org/glm-4.5"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

Key Features:

  • OpenAI-Compatible API for seamless integration
  • Flexible parameter control for fine-tuning responses
  • Streaming support for real-time responses

Option 2: Multi-Agent Workflows with OpenAI Agents SDK Build sophisticated multi-agent systems using GLM-4.5:

  • Plug-and-Play Integration: Use GLM-4.5 in any OpenAI Agents workflow
  • Advanced Agent Capabilities: Support for handoffs, routing, and tool integration with 90.6% success rate
  • Scalable Architecture: Design agents that leverage GLM-4.5’s unified reasoning, coding, and agentic capabilities

Connect with Third-Party Platforms

  • Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor and Cline through OpenAI-compatible APIs
  • Orchestration Frameworks: Connect with LangChain, Dify, Langflow, and other AI orchestration platforms using official connectors
  • Hugging Face Integration: Use GLM-4.5 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints

🔬 Technical Innovation of GLM-4.5

MoE Architecture Excellence

GLM-4.5 adopts Mixture of Experts (MoE) architecture improving compute efficiency for both training and inference. Compared to DeepSeek-V3, the design reduces width (hidden dimension and routed experts) while increasing height (number of layers).

Key technical features:

  • Grouped-Query Attention with partial RoPE (continued from ChatGLM2)
  • QK-Norm to stabilize attention logits range
  • Muon optimizer for accelerated convergence and larger batch size tolerance
  • MTP (Multi-Token Prediction) layer supporting speculative decoding during inference

Advanced Training Pipeline

Pre-training: Two-stage approach

  • 15T tokens on general pre-training corpus
  • 7T tokens on code & reasoning corpus

Mid-training: Domain-specific optimization

  • Repo-Level Code Data (500B tokens)
  • Synthetic Reasoning Data (500B tokens)
  • Long Context & Agent Data (100B tokens)

Post-training: Sophisticated hybrid approach

  1. Expert Training: Separate models for Reasoning, Agentic, and General domains through SFT and specialized RL
  2. Unified Training: Knowledge distillation combining experts into single model via large-scale SFT self-distillation, followed by three-stage RL alignment

slime: Revolutionary RL Infrastructure

GLM-4.5’s training is powered by slime, an open-source RL infrastructure designed for large-scale models:

  • Flexible Hybrid Training Architecture: Supports both synchronous co-located training and disaggregated asynchronous training
  • Decoupled Agent-Oriented Design: Separates rollout engines from training engines for optimized performance
  • Accelerated Data Generation: Mixed-precision inference using FP8 for data generation while maintaining BF16 stability for training

🎯 Ready to Experience Unified AI?

Try GLM-4.5 and GLM-4.5-Air today on the Novita AI Platform. Experience firsthand how unified AI capabilities are transforming what’s possible when reasoning, coding, and agentic functionality converge in optimized, production-ready infrastructure.

Start Building Today

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading