Novita AI Partners with Z.ai to Offer GLM-4.5: Unifying Reasoning, Coding, and Agentic AI Capabilities

Table Of Contents

⚡ Overall Performance
🚀 Get Started with Novita AI
🔬 Technical Innovation of GLM-4.5
🎯 Ready to Experience Unified AI?

Today, we’re excited to announce Novita AI’s partnership with Z.ai to bring day-one support for GLM-4.5 on the Novita AI platform as a Z.ai launch partner. This groundbreaking collaboration introduces the world’s most unified AI model series, combining advanced reasoning, sophisticated coding capabilities, and native agentic functionality in a single, powerful framework designed for developers building the next generation of AI applications.

Novita AI is now offering the groundbreaking GLM-4.5 model (355B total params, 32B active): Built with hybrid reasoning modes offering both thinking mode for complex reasoning and tool usage, and non-thinking mode for instant responses. Ranks 2nd place overall across comprehensive benchmarks.

Both models feature 128k context length and native function calling capacity, available through Novita AI’s optimized inference infrastructure.

⚡ Overall Performance

GLM-4.5 ranks 2nd place and GLM-4.5-Air ranks 5th across 12 benchmarks covering agentic (3), reasoning (7), and coding (2) tasks, compared against models from OpenAI, Anthropic, Google DeepMind, xAI, Alibaba, Moonshot, and DeepSeek.

GLM-4.5 unifies all capabilities where previous models excelled in specific areas—coding, math, or reasoning—but none achieved best performance across all tasks.

Agentic Tasks

GLM-4.5 is a foundation model optimized for agentic tasks. It provides 128k context length and native function calling capacity. Z.ai measured its agent ability on τ-bench and BFCL-v3 (Berkeley Function Calling Leaderboard v3). On both benchmarks, GLM-4.5 matches the performance of Claude-4-Sonnet.

Web browsing is a popular agentic application that requires complex reasoning and multi-turn tool using. Z.ai evaluated GLM-4.5 on the BrowseComp benchmark, a challenging benchmark for web browsing that consists of complicated questions that expect short answers. With access to the web browsing tool, GLM-4.5 gives correct answers for 26.4% of all questions, clearly outperforming Claude-4-Opus (18.8%) and close to o4-mini-high (28.0%).

Benchmark	GLM-4.5	GLM-4.5-Air	o3	o4-mini-high	GPT-4.1	Claude 4 Opus	Claude 4 Sonnet	Gemini 2.5 Pro	Qwen3 235B Thinking 2507	DeepSeek R1 0528	Kimi K2	Grok4
TAU-bench	70.1	69.4	61.2	57.4	62.0	70.5	70.3	62.5	73.2	58.7	62.6	67.5
BFCL v3 (Full)	77.8	76.4	72.4	67.2	68.9	61.8	75.2	61.2	72.4	63.8	71.1	66.2
BrowseComp	26.4	21.3	49.7	28.3	4.1	18.8	14.7	7.6	4.6	3.2	7.9	32.6

Reasoning

Under the thinking mode, GLM-4.5 and GLM-4.5-Air can solve complex reasoning problems including mathematics, science, and logical problems.

Benchmark	GLM-4.5	GLM-4.5-Air	o3	o4-mini-high	Claude 4 Opus	Claude 4 Sonnet	Gemini 2.5 Pro	Gemini 2.5 Flash	DeepSeek R1 0528	Qwen3-235B Thinking 2507	Grok4
MMLU Pro	84.6	81.4	85.3	83.2	87.3	84.2	86.2	83.2	84.9	84.5	86.6
AIME24	91.0	89.4	90.3	94.0	75.7	77.3	88.7	82.3	89.3	94.1	94.3
MATH 500	98.2	98.1	99.2	98.9	98.2	99.1	96.7	98.1	98.3	98.0	99.0
SciCode	41.7	37.3	41.0	46.5	39.8	40.0	42.8	39.4	40.3	42.9	45.7
GPQA	79.1	75.0	82.7	78.4	79.6	77.7	84.4	79.0	81.3	81.1	87.7
HLE	14.4	10.6	20.0	17.5	11.7	8.5	21.1	11.1	14.9	15.8	23.9
LiveCodeBench (2407-2501)	72.9	70.7	78.4	80.4	63.6	58.0	80.1	69.5	77.0	78.2	81.9
AA-Index (Estimated)	67.7	64.8	70.0	69.8	64.4	62.7	70.5	65.1	68.3	69.4	73.2

Coding

GLM-4.5 is also good at coding, including both building a coding project from scratch, and agentically solving coding tasks in existing projects. It can be seamlessly combined with existing coding toolkits such as Claude Code, Roo Code, and CodeGeex. To evaluate the coding capability, Z.ai compared different models on SWE-bench Verified and Terminal Bench.

Benchmark	GLM-4.5	GLM-4.5-Air	o3	o4-mini-high	GPT-4.1	Claude 4 Opus	Claude 4 Sonnet	Gemini 2.5 Pro	Gemini 2.5 Flash	Qwen3 235B Thinking 2507	Qwen3 235B	DeepSeek R1 0528	Kimi K2
SWE-bench Verified	64.2	57.6	69.1	54.8	48.6	67.8	70.4	49.0	60.4	35.0	36.2	41.4	65.4
Terminal-Bench	37.5	30.0	30.2	18.5	30.3	43.2	35.5	25.3	16.8	6.3	6.6	17.5	25.0

To evaluate GLM-4.5’s agentic coding capabilities in real-world scenarios, Z.ai used Claude Code to conduct comprehensive testing against Claude-4-Sonnet, Kimi K2, and Qwen3-Coder using 52 coding tasks covering frontend development, tool development, data analysis, testing, and algorithm applications. GLM-4.5 wins against Kimi K2 in 53.9% of the tasks and dominates Qwen3-Coder with an 80.8% win rate, while showing room for improvement against Claude-4-Sonnet.

Notably, GLM-4.5 achieves the highest average tool calling success rate at 90.6%, outperforming Claude-4-Sonnet (89.5%), Kimi-K2 (86.2%), and Qwen3-Coder (77.1%), demonstrating superior reliability and efficiency in agentic coding tasks.

🚀 Get Started with Novita AI

Use the Playground (No Coding Required)

Instant Access: Sign up and start experimenting with GLM-4.5 in seconds
Interactive Interface: Test complex reasoning prompts and visualize structured outputs in real-time
Model Comparison: Compare GLM-4.5 with other leading models for your specific use case

Integrate via API (For Developers)

Connect GLM-4.5 to your applications with Novita AI’s unified REST API.

Option 1: Direct API Integration (Python Example)

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="session_S4q9KTdBQujFkXSE5aZYZCrwN9f5QO96BtAFLw4FOgB__slLHW9KFAjmMgC12ag6mf2lJ1rASEvHbP_gv7Jh2Q==",
)

model = "zai-org/glm-4.5"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Key Features:

OpenAI-Compatible API for seamless integration
Flexible parameter control for fine-tuning responses
Streaming support for real-time responses

Option 2: Multi-Agent Workflows with OpenAI Agents SDK Build sophisticated multi-agent systems using GLM-4.5:

Plug-and-Play Integration: Use GLM-4.5 in any OpenAI Agents workflow
Advanced Agent Capabilities: Support for handoffs, routing, and tool integration with 90.6% success rate
Scalable Architecture: Design agents that leverage GLM-4.5’s unified reasoning, coding, and agentic capabilities

Connect with Third-Party Platforms

Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor and Cline through OpenAI-compatible APIs
Orchestration Frameworks: Connect with LangChain, Dify, Langflow, and other AI orchestration platforms using official connectors
Hugging Face Integration: Use GLM-4.5 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints

🔬 Technical Innovation of GLM-4.5

MoE Architecture Excellence

GLM-4.5 adopts Mixture of Experts (MoE) architecture improving compute efficiency for both training and inference. Compared to DeepSeek-V3, the design reduces width (hidden dimension and routed experts) while increasing height (number of layers).

Key technical features:

Grouped-Query Attention with partial RoPE (continued from ChatGLM2)
QK-Norm to stabilize attention logits range
Muon optimizer for accelerated convergence and larger batch size tolerance
MTP (Multi-Token Prediction) layer supporting speculative decoding during inference

Advanced Training Pipeline

Pre-training: Two-stage approach

15T tokens on general pre-training corpus
7T tokens on code & reasoning corpus

Mid-training: Domain-specific optimization

Repo-Level Code Data (500B tokens)
Synthetic Reasoning Data (500B tokens)
Long Context & Agent Data (100B tokens)

Post-training: Sophisticated hybrid approach

Expert Training: Separate models for Reasoning, Agentic, and General domains through SFT and specialized RL
Unified Training: Knowledge distillation combining experts into single model via large-scale SFT self-distillation, followed by three-stage RL alignment

slime: Revolutionary RL Infrastructure

GLM-4.5’s training is powered by slime, an open-source RL infrastructure designed for large-scale models:

Flexible Hybrid Training Architecture: Supports both synchronous co-located training and disaggregated asynchronous training
Decoupled Agent-Oriented Design: Separates rollout engines from training engines for optimized performance
Accelerated Data Generation: Mixed-precision inference using FP8 for data generation while maintaining BF16 stability for training

🎯 Ready to Experience Unified AI?

Try GLM-4.5 and GLM-4.5-Air today on the Novita AI Platform. Experience firsthand how unified AI capabilities are transforming what’s possible when reasoning, coding, and agentic functionality converge in optimized, production-ready infrastructure.

Start Building Today →

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Novita AI Partners with Z.ai to Offer GLM-4.5: Unifying Reasoning, Coding, and Agentic AI Capabilities