How to Access GLM-4.5-Air: Complete Integration Guide

GLM-4.5-Air

GLM-4.5-Air is a 106-billion-parameter hybrid reasoning model that combines thinking mode for complex tasks with non-thinking mode for immediate responses. Novita AI provides enterprise-grade infrastructure to access GLM-4.5-Air with a full 200K token context window, OpenAI-compatible APIs, and seamless integration with popular development tools.

This guide covers three access methods ,third-party platfrom intergrations, and production best practices. Whether you’re prototyping in the Playground or building multi-agent systems, you’ll learn how to leverage GLM-4.5-Air’s capabilities on Novita AI.

Current pricing on Novita AI: 131072 Context, $0.13/1M input tokens, $0.85/1M output token

What is GLM-4.5-Air?

GLM-4.5-Air is an open-source hybrid reasoning model developed by Zhipu AI with 106 billion total parameters and 12 billion active parameters. Released under the MIT license, it operates in two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses. The model ranks 3rd among proprietary and open-source models with a benchmark score of 59.8, making it competitive for intelligent agents, code generation, and conversational AI.

Key Specifications:

  • Total Parameters: 106 billion
  • Active Parameters: 12 billion (efficient inference)
  • Context Window: 200K tokens
  • License: MIT (commercial-friendly)
  • Modes: Hybrid reasoning (thinking + non-thinking)
  • Languages: English and Chinese
  • Benchmark Score: 59.8 (ranks 3rd globally)

Why Use Novita AI for GLM-4.5-Air?

Novita AI provides several advantages for deploying GLM-4.5-Air:

  • Extended Context Window: Full 200K token access for processing large documents
  • OpenAI-Compatible and Anthropic-Compatible API: Drop-in replacement for existing OpenAI and Anthropic integrations
  • Multiple Access Methods: Playground, REST API, and multi-agent SDK support
  • Official Hugging Face Provider: Guaranteed ecosystem compatibility
  • Production-Ready Infrastructure: Scalable with reliable uptime and performance

Getting Started: Three Access Methods

Playground (No Coding Required)

The fastest way to experience GLM-4.5-Air is through Novita AI’s Playground. Sign up and start experimenting in seconds through an interactive web interface. Test prompts, see outputs in real-time with the full 200K context window, and compare GLM-4.5-Air with other leading models.

Steps:

  1. Sign up at novita.ai
  2. Navigate to the Playground section
  3. Select “GLM-4.5-Air” (zai-org/glm-4.5-air) from the model dropdown
  4. Configure parameters (temperature, max tokens, context window)
  5. Enter your prompt and receive real-time responses

Best For:

  • Testing model capabilities before development
  • Prototyping conversation flows
  • Comparing GLM-4.5-Air with alternative models
  • Non-technical team members evaluating the model

Direct API Integration

For production applications, integrate GLM-4.5-Air using Novita AI’s REST API with OpenAI-compatible endpoints. The API follows OpenAI’s standard, making migration seamless.

Python API Example

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/openai",
    api_key="session_duBHl2DVlV6du_ok366b2W75FUXvMispDPd0B8iSN6ZnPioDXgmub80FCRPWcty-OAbxrlQLy5oYgqfs_77EQg==",
)

model = "zai-org/glm-4.5-air"
stream = True # or False
max_tokens = 49152
system_content = "Be a helpful assistant"
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

OpenAI Agents SDK

Build sophisticated multi-agent systems using the OpenAI Agents SDK with GLM-4.5-Air’s 200K context window. Support for agent handoffs, routing logic, and tool integration makes it ideal for complex workflows.

Key Capabilities:

  • Agent handoffs between specialized models
  • Tool integration (web search, code execution, data retrieval)
  • Routing logic based on user intent
  • Persistent 200K token context across agents

Example Use Case: Customer support system with routing agents, knowledge base agents, and escalation agents sharing context through the 200K token window.

Third-Party Platform Integration

GLM-4.5-Air on Novita AI connects seamlessly with popular development tools through OpenAI-compatible APIs.

Coding Agents

Integrate with AI-powered coding assistants:

  • Claude Code: Use Anthropic-compatible API endpoints
  • Cline, Cursor, Trae, Qwen Code: Full OpenAI SDK compatibility

Orchestration Frameworks

Connect with workflow orchestration tools:

  • LangChain: Use ChatOpenAI with Novita AI base URL
  • Dify: Add Novita AI as custom model provider
  • CrewAI: Configure agents with Novita AI endpoints
  • Langflow: Use OpenAI node with Novita AI configuration

Hugging Face

Novita AI is an official inference provider for Hugging Face, ensuring ecosystem compatibility. Access GLM-4.5-Air through Hugging Face Inference API or transformers library.

Production Best Practices

Context Window Management

With 200K tokens available, optimize context usage:

  • Sliding Window: Retain recent messages, summarize older context
  • Hierarchical Summarization: Compress documents while preserving key information
  • Selective Context: Include only relevant information per query
  • Token Monitoring: Track usage to optimize costs

Response Streaming

Enable streaming for better user experience:

  • Lower perceived latency
  • Progressive content display
  • Better user engagement
  • Reduced timeout risks for long responses

Thinking Mode vs. Non-Thinking Mode

Select mode based on task complexity:

  • Thinking Mode: Complex reasoning, multi-step problems, tool usage, planning
  • Non-Thinking Mode: Immediate responses, simple queries, low-latency needs

Specify mode via system prompt for optimal performance.

Security and Compliance

  • API Key Management: Store in environment variables or secure vaults
  • Rate Limiting: Implement application-level controls
  • Data Privacy: Review Novita AI’s data retention policies
  • Input Validation: Sanitize inputs before API calls
  • License Compliance: MIT license allows commercial use

Use Cases and Performance

Intelligent Agents Hybrid reasoning enables autonomous agents with planning, tool usage, and decision-making. The 200K context retains full conversation history.

Code Generation and Review Understands entire codebases with 106B parameters. The 200K context processes large files and multiple modules simultaneously.

Long-Document Analysis Process technical docs, legal contracts, and research papers in their entirety without chunking.

Multilingual Customer Support Native English and Chinese support with context retention across sessions.

Content Creation Generate long-form content with consistent style. Extended context maintains narrative coherence.

Performance Metrics

  • Benchmark Score: 59.8
  • Context Window: 200K tokens
  • Active Parameters: 12B (efficient inference)
  • Languages: English and Chinese (native training)
  • Latency: Optimized for production workloads

Model Comparison

GLM-4.5-Air offers the largest context window while maintaining competitive performance and cost efficiency.

GLM-4.5-Air benchmark

Conclusion

GLM-4.5-Air on Novita AI delivers hybrid reasoning with a 200K context window, making it ideal for intelligent agents, code generation, and long-document processing. The OpenAI-compatible API enables seamless integration from prototype to production. Start with the Playground to test capabilities, then integrate via the Python API for your applications. Sign up at novita.ai today and begin building with hybrid reasoning and extended context capabilities.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading