How to Use CoBuddy in Claude Code via Novita AI
Step-by-step guide to configure CoBuddy (baidu/cobuddy) in Claude Code using Novita AI's OpenAI-compatible endpoint. API setup, pricing, and coding workflow tips.
Step-by-step guide to configure CoBuddy (baidu/cobuddy) in Claude Code using Novita AI's OpenAI-compatible endpoint. API setup, pricing, and coding workflow tips.
Configure DeepSeek V4 Flash in Claude Code via Novita AI. Set env vars, use the Anthropic-compatible endpoint, and cut costs vs Claude Sonnet.
GLM 5.2 is available on Novita AI with 1M context, 128K max output, function calling, structured outputs, and serverless API access.
Kimi K2.7 Code is live on Novita AI with OpenAI-compatible chat API access, 256K context, tool calling, and multimodal inputs.
Nemotron 3 Nano 30B A3B is available on Novita AI as a Serverless LLM with OpenAI-compatible chat completions, 256K context, and pay-as-you-go token pricing.
CoBuddy is available on Novita AI as a coding-focused LLM API for code generation, coding assistants, and AI agent workflows.
Use MiniMax M3 on Novita AI for coding, agentic workflows, 1M-token context, and multimodal input with OpenAI-compatible APIs.
Use Qwen3.6-27B on Novita AI via OpenAI-compatible API. See model ID, pricing, 262K context, coding use cases, and gotchas.
Qwen3.7-Max is available on Novita AI for agentic coding and long-context workflows. Review API access, pricing, limits, and use cases.
PegaFlow external KV cache helps vLLM serving teams preserve and share KV cache across restarts, instances, and RDMA nodes.
DeepSeek-V4-Pro is a 1.6T-parameter open-source MoE model delivering 1 LiveCodeBench score (93.5) and 1M-token context. Available now via Novita AI.
DeepSeek-V4-Flash is now available via Novita AI. 284B MoE model, 1M token context, selectable reasoning modes. $0.14/M input. OpenAI-compatible API.
Ling-2.6-1T is Ant Group's trillion-scale model built on MLA + Hybrid Linear Attention — not standard MoE. It achieves open-source SOTA on agent benchmarks (SWE-bench, BFCLv4, TAU2
Ling-2.6-flash is a 104B MoE model (7.4B active) delivering 340 tokens/s and 7x better token efficiency than Nemotron-3-Super on agent benchmarks. Available now via OpenRouter with