PegaFlow External KV Cache for vLLM
PegaFlow external KV cache helps vLLM serving teams preserve and share KV cache across restarts, instances, and RDMA nodes.
PegaFlow external KV cache helps vLLM serving teams preserve and share KV cache across restarts, instances, and RDMA nodes.
DeepSeek-V4-Pro is a 1.6T-parameter open-source MoE model delivering #1 LiveCodeBench score (93.5) and 1M-token context. Available now via Novita AI.
DeepSeek-V4-Flash is now available via Novita AI. 284B MoE model, 1M token context, selectable reasoning modes. $0.14/M input. OpenAI-compatible API.
Ling-2.6-1T is Ant Group’s trillion-scale model built on MLA + Hybrid Linear Attention — not standard MoE. It achieves open-source SOTA on agent benchmarks (SWE-bench, BFCLv4, TAU2-Bench) with minimal token overhead, now exclusively backed by Novita AI.
Ling-2.6-flash is a 104B MoE model (7.4B active) delivering 340 tokens/s and ~7x better token efficiency than Nemotron-3-Super on agent benchmarks. Available now via OpenRouter with Novita BYOK.
Kimi K2.6 is now on Novita AI. 1T MoE open-source model, 256K context, 58.6% SWE-Bench Pro — built for long-horizon agentic coding. Try free via OpenAI-compatible API.
GLM-5.1 by Z.ai is now on Novita AI — tops SWE-Bench Pro at 58.4 and runs autonomous coding tasks for 8 hours. Try it with our serverless API, pay per token.
Gemma 4 is now available on Novita AI — 4 model sizes, 3 architectures, vision support across the lineup, audio support on E2B and E4B.
Access Kimi K2.5 on Novita AI – Moonshot AI’s flagship multimodal agentic model with 256K context, vision+text, thinking modes & agent swarm.
Explore the advantages of Speech 2.6 for TTS and voice agents. Discover how it boosts productivity and efficiency in applications.
Explore MiniMax Speech 2.5, a solution for high-accuracy voice cloning with fast response times and multilingual support.
Access GLM-4.6V API on Novita AI: 106B vision-language model with 128K context, native function calling, and SoTA multimodal document understanding.