GLM-4.7 on Novita AI: Long-Context Agentic Coding via API

GLM 4.7 on Novita AI

GLM-4.7 is now available on the Novita AI platform, bringing Z.AI’s latest flagship text model to a production-ready, OpenAI-compatible serverless API. GLM-4.7 is optimized for agentic coding, long-horizon planning, and tool-using workflows, with stronger “think → act” reliability and noticeably improved front-end aesthetics for real product delivery.

On Novita AI, you can run GLM-4.7 with 204,800 context, up to 131,072 output, fp8 quantization, and built-in support for Function Calling and Structured Output.

What is GLM-4.7?

GLM-4.7 is Z.AI’s latest flagship text model, with major upgrades focused on advanced coding, long-range task planning, and more reliable tool collaboration—designed to complete tasks end-to-end rather than just generating isolated code snippets.

Core specs (official):

  • Context window: 200K tokens
  • Max output: 128K tokens
  • Capabilities: thinking modes, streaming, function calling, context caching, structured output (JSON), and MCP tool/data-source integration

💡What you get on Novita AI (production-ready serverless):

ItemDetails
ModelGLM-4.7
Context length204,800
Max output131,072
Quantizationfp8
Function Calling / Structured OutputSupported

Why GLM-4.7 on Novita AI

Transparent serverless pricing (pay per token)

On Novita AI, GLM-4.7 runs as a serverless endpoint with clear, per-token billing:

  • Input: $0.6 / 1M tokens
  • Cache Read: $0.11 / 1M tokens
  • Output: $2.2 / 1M tokens

That Cache Read line matters: it enables cost-efficient long-horizon workflows (think “agent working across a large repo/spec over many turns”). Click here to know more information about pricing.

OpenAI-compatible API for instant integration

If you already use OpenAI’s Chat Completions style APIs, you can migrate by setting Novita’s base URL and switching the model name—no new protocol to learn.

Built for agentic delivery

Z.AI positions GLM-4.7 around “task completion,” with stronger instruction following during tool use and improved stability for complex agent loops.

GLM-4.7 Capabilities & Benchmarks

GLM-4.7 is designed around agentic coding (shipping tasks end-to-end), stronger reasoning with controllable thinking, and more reliable tool-using workflows—with a noticeable jump in web/UI generation quality (“vibe coding”).

Capabilities

  • Agentic Coding, end-to-end: better at planning, implementing, and iterating across multi-file projects and real agent frameworks.
  • Thinking before acting (more stable agents): improved instruction-following and complex-task stability; supports turn-level control to balance cost/latency vs. reliability.
  • Tool Using & Web browsing: stronger tool execution patterns and browsing-style tasks.
  • Complex Reasoning uplift: measurable gains on hard reasoning evaluations (including tool-augmented settings).
  • Vibe Coding (UI & slides quality): cleaner modern webpages and better-looking slides/layout.

Standardized Benchmarks

The following scores are reported by Z.AI:

CategoryBenchmarkGLM-4.7
Coding (real bugfix)SWE-bench Verified73.8
Agentic / terminalTerminal Bench 2.041.0
Coding (live)LiveCodeBench v684.9
Tool use (interactive)τ²-Bench87.4
Web browsingBrowseComp52.0 (and 67.5 w/ context manage)
Reasoning (tools)HLE (w/ Tools)42.8
Benchmark chart titled “LLM Performance Evaluation: Agentic, Reasoning and Coding” comparing GLM-4.7 with GLM-4.6, DeepSeek-V3.2, Claude Sonnet 4.5, and GPT-5.1 (High) across eight benchmarks (AIME 25, LiveCodeBench v6, GPQA-Diamond, HLE, SWE-bench Verified, Terminal Bench 2.0, τ²-Bench, BrowseComp) under 128K context; GLM-4.7 leads on SWE-bench Verified (73.8), Terminal Bench 2.0 (41.0), and τ²-Bench (87.4), and shows 42.8 on HLE with tools.

LMArena “Human Preference” Signal

LMArena rankings are based on blind user votes and are a useful “how it feels” complement to benchmarks.

  • WebDev Leaderboard: GLM-4.7 is #6 with Score 1447 (+10/-10), 4,833 votes (last updated Jan 16, 2026).
  • Text Arena (Overall): GLM-4.7 is #18 with Score 1443 (±7), 8,258 votes (last updated Jan 12, 2026).

🏆Open-model positioning: on both leaderboards, the models ranked above GLM-4.7 are shown with Proprietary licenses, while GLM-4.7 is MIT—making it the highest-ranked open-license model in WebDev and Text (Overall)at the time of those leaderboard updates.

Getting Started with GLM-4.7 on Novita AI

Option A: Use the Playground

The easiest way to get to know GLM-4.7 is to try it directly in the Novita AI Playground.You can start interacting with GLM-4.7 instantly in the Novita AI Playground—no setup, no code. Just sign up, open the Playground, and test prompts in real time. New accounts receive free credits after registration, so you can try the model right away.

Option B: Integrate via API

Connect GLM-4.6V to your applications using Novita AI’s unified REST API.

Getting Your API Key on Novita AI

  • Step 1: Create or Login to Your Account

Visit https://novita.ai and sign up or log in to your existing account

  • Step 2: Navigate to Key Management

After logging in, find “API Keys”

  • Step 3: Create a New Key

Click the “Add New Key” button.

  • Step 4: Save Your Key Immediately

Copy and store the key as soon as it is generated; it is usually shown only once and cannot be retrieved later. Keep the key in a secure location such as a password manager or encrypted notes

Direct API Integration

from openai import OpenAI

client = OpenAI(
    api_key="<Your API Key>",
    base_url="https://api.novita.ai/openai"
)

response = client.chat.completions.create(
    model="zai-org/glm-4.7",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ],
    max_tokens=131072,
    temperature=0.7
)

print(response.choices[0].message.content) 

Multi-Agent Workflows with OpenAI Agents SDK

Build sophisticated agent systems with plug-and-play integration—supporting handoffs, routing, and tool use via native function calling, plus the full long-context window for complex, multi-step tasks.

Option C: Connect with Third-Party Platforms

If you’re already building with agent frameworks or developer tools, Novita AI is designed to plug in with minimal friction:

  • Agent frameworks & app builders: Follow Novita’s step-by-step integration guides to connect with popular tooling such as Continue, AnythingLLM, LangChain, and Langflow.
  • Hugging Face Hub: Novita is listed as an Inference Provider on Hugging Face, so you can run supported models through Hugging Face’s provider workflow and ecosystem.
  • OpenAI-compatible API: Novita’s LLM endpoints are compatible with the OpenAI API standard, making it easy to migrate existing OpenAI-style apps and connect many OpenAI-compatible tools ( ClineCursor, Trae and Qwen Code) .
  • Anthropic-compatible API (Claude Code workflows): Novita also provides Anthropic SDK–compatible access so you can integrate Novita-backed models into Claude Code style agentic coding workflows.
  • OpenCode (Built-in provider): Novita AI is now integrated directly into OpenCode as a supported provider, so users can select Novita in OpenCode without manual configuration.

Production Patterns

  1. Use Prompt Cache for long-horizon agents

If you run multi-turn workflows on large, stable context (repo snapshot, long spec, design doc), caching can significantly reduce cost—Novita exposes Cache Read pricing explicitly.

  1. Structured Output for reliable pipelines

When integrating with workflow engines, validators, or UIs, prefer JSON-structured outputs (schema-driven) to reduce parsing edge cases. Novita lists Structured Output as supported for GLM-4.7.

  1. Function Calling for tool-augmented coding

Wrap your tools as functions: repo search, ticket lookup, CI trigger, database read, web fetch—then let the model decide when to call them. GLM-4.7 is explicitly designed for stronger tool collaboration.

  1. Thinking mode policy: “fast by default, deep when needed”
  • trivial Q&A / formatting: thinking off
  • debugging / multi-step refactors: thinking on
  • long tasks: consider modes that improve stability and cache hit rate

Conclusion

GLM-4.7 brings a practical set of upgrades for developers building agentic coding and long-horizon tool-using workflows: 200K context, controllable thinking, stronger function calling behavior, and better front-end “vibe coding” outputs.

On Novita AI, you can start immediately with an OpenAI-compatible serverless API, with transparent token pricing and built-in support for function calling and structured outputs—ready for production-grade agent pipelines.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing affordable and reliable GPU cloud for building and scaling.

Frequently Asked Questions

What is GLM-4.7?

GLM-4.7 is Z.AI’s flagship LLM, positioned for enhanced programming and more stable multi-step reasoning/execution, and it is released with an official open-weights model (available on Hugging Face).

Is GLM-4.7 free?

On Novita AI, GLM-4.7 is pay-per-token: $0.6/M tokens (input), $0.11/M tokens (cache read), and $2.2/M tokens (output)
On Z.ai, access is commonly packaged via a paid Coding Plan (starting at $3/month).
Some platforms may offer limited trials/quotas like Novita AI, but GLM-4.7 itself isn’t universally “free.”

Is GLM-4.7 really good?

For coding + agentic workflows, it’s positioned as a top-tier open model by its publisher. Z.AI reports strong results on coding and agent benchmarks (e.g., LiveCodeBench v6, SWE-bench Verified, BrowseComp, τ²-Bench), and frames it as competitive with Claude Sonnet 4.5 on several measurements


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading