GLM 5.2 on Novita AI: Long-Context Launch, Pricing, and Developer Fit
GLM 5.2 is available on Novita AI with 1M context, 128K max output, function calling, structured outputs, and serverless API access.
GLM 5.2 is available on Novita AI with 1M context, 128K max output, function calling, structured outputs, and serverless API access.
Kimi K2.7 Code is live on Novita AI with OpenAI-compatible chat API access, 256K context, tool calling, and multimodal inputs.
Nemotron 3 Nano 30B A3B is available on Novita AI as a Serverless LLM with OpenAI-compatible chat completions, 256K context, and pay-as-you-go token pricing.
CoBuddy is available on Novita AI as a coding-focused LLM for code generation and AI agent workflows with OpenAI-compatible API access.
Use MiniMax M3 on Novita AI for coding, agentic workflows, 1M-token context, and multimodal input with OpenAI-compatible APIs.
Use Qwen3.6-27B on Novita AI via OpenAI-compatible API. See model ID, pricing, 262K context, coding use cases, and gotchas.
Qwen3.7-Max is available on Novita AI for agentic coding and long-context workflows. Review API access, pricing, limits, and use cases.
PegaFlow external KV cache helps vLLM serving teams preserve and share KV cache across restarts, instances, and RDMA nodes.
DeepSeek-V4-Pro is a 1.6T-parameter open-source MoE model delivering 1 LiveCodeBench score (93.5) and 1M-token context. Available now via Novita AI.
DeepSeek-V4-Flash is now available via Novita AI. 284B MoE model, 1M token context, selectable reasoning modes. $0.14/M input. OpenAI-compatible API.
Ling-2.6-1T is Ant Group's trillion-scale model built on MLA + Hybrid Linear Attention — not standard MoE. It achieves open-source SOTA on agent benchmarks (SWE-bench, BFCLv4, TAU2
Ling-2.6-flash is a 104B MoE model (7.4B active) delivering 340 tokens/s and 7x better token efficiency than Nemotron-3-Super on agent benchmarks. Available now via OpenRouter with
Kimi K2.6 is now on Novita AI. 1T MoE open-source model, 256K context, 58.6% SWE-Bench Pro — built for long-horizon agentic coding. Try free via OpenAI-compatible API.
GLM-5.1 by Z.ai is now on Novita AI — tops SWE-Bench Pro at 58.4 and runs autonomous coding tasks for 8 hours. Try it with our serverless API, pay per token.