Most capable open-source models make you choose: raw intelligence or token efficiency. Thinking models burn 3–5× more tokens per request. Smaller non-reasoning models cut costs but cap capability. Ling-2.6-1T is built to break that tradeoff.
Ling-2.6-1T is a trillion-scale comprehensive flagship model from Ant Group (inclusionAI), designed for immediate task execution. Built on MLA + Hybrid Linear Attention architecture, it achieves a superior intelligence-to-token ratio: strong benchmark performance with minimal output token overhead. On AIME26, it significantly outperforms other non-thinking models. On agent execution benchmarks — SWE-bench Verified, BFCLv4, TAU2-Bench, Claw-Eval — it reaches open-source SOTA. Now exclusively backed by Novita AI as the inference provider.
In short: Ling-2.6-1T delivers comprehensive frontier capability for agent workloads — complex reasoning, tool use, multi-step execution, and long-context instruction following — at a fraction of the token cost of thinking models.
What Is Ling-2.6-1T?
Ling-2.6-1T is the latest flagship model from inclusionAI, the AI research arm of Ant Group (AntLingAGI). It’s a 1-trillion-parameter Mixture-of-Experts model — the largest FP8-trained foundation model released to date — trained on 20T+ high-quality tokens with over 40% reasoning-dense data in later stages.
Unlike thinking models (DeepSeek-R1, QwQ) that output long chain-of-thought traces before answering, Ling-2.6-1T uses a “fast thinking” mechanism: it internalizes reasoning without externalizing verbose thought chains. This keeps token output lean while maintaining strong analytical depth. ~50B parameters activate per token, making inference practical at 1T scale.
- Architecture: MLA + Hybrid Linear Attention, 1T total parameters, ~50B active params per token
- Context window: 262,144 tokens (via YaRN rope scaling), max output 32,768 tokens
- Training: FP8 mixed-precision, 20T+ tokens, >40% reasoning-dense data
- Paradigm: Fast-thinking — internalized reasoning, no verbose chain-of-thought output
- License: MIT — fully open weights
- Availability: Exclusively backed by Novita AI (OpenRouter provider)
Key Features: Why Ling-2.6-1T Stands Out
Superior Intelligence-to-Token Ratio
Thinking models produce impressive results but inflate your token bill — hundreds of reasoning tokens before the actual answer. Ling-2.6-1T was trained with Evolutionary Chain-of-Thought (Evo-CoT) in mid-training, internalizing reasoning rather than externalizing it. The result: strong benchmark scores on AIME26 (outperforming other non-thinking models), LiveCodeBench, and Omni-MATH — without paying for the thought process. Per the official model card, it achieves intelligence-output efficiency on par with GPT-5.4 (Non-Reasoning), representing a major leap over its predecessor Ling-1T. For high-throughput production workloads, this directly reduces cost.
Open-Source SOTA on Agent Execution
Agent workloads require more than math and coding in isolation — they require tool use, multi-step execution, and reliable instruction following under real-world conditions. Ling-2.6-1T reaches open-source SOTA across the key agent benchmarks (per inclusionAI model card):
- SWE-bench Verified — real-world software engineering task resolution
- BFCLv4 — Berkeley Function-Calling Leaderboard v4, complex tool-use
- TAU2-Bench — long-horizon agentic task completion
- Claw-Eval — multi-turn command execution
- PinchBench — composite agent capability evaluation
On LiveCodeBench (Aug 2024–May 2025), it scores 61.68 — outperforming DeepSeek-V3.1 (48.02), Kimi-K2-0905 (48.95), and GPT-5-main (48.57) by 13+ points. For front-end generation, ArtifactsBench score is 59.31 — second only to Gemini-2.5-Pro(lowthink) at 60.28 in this comparison group (per inclusionAI model card).
Long Context + Instruction Following
With 262,144-token context (YaRN rope scaling), Ling-2.6-1T can hold entire codebases, long documents, or extended multi-turn agent conversations in a single call. On the MRCR benchmark (16K–256K context range), it consistently maintains retrieval accuracy — a critical requirement for agent pipelines that process long tool outputs or document corpora. IFBench score is 56.9%, demonstrating strong complex instruction-following under extended context.
Benchmark Performance
Independent measurements from Artificial Analysis place Ling-2.6-1T at an Intelligence Index of 33.6 — better than 73% of 495 measured models, and #2 in the open-weights large non-reasoning class. Below are self-reported scores from the inclusionAI model card (comparing against DeepSeek-V3.1-terminus, Kimi-K2-0905, GPT-5-main, and Gemini-2.5-Pro(lowthink)), followed by independently verified AA scores.
Math & Reasoning (per inclusionAI model card)
| Benchmark | Ling-2.6-1T | DeepSeek-V3.1 | Kimi-K2-0905 | GPT-5-main | Gemini-2.5-Pro* |
|---|---|---|---|---|---|
| AIME26 | 70.42 | 55.21 | 50.16 | 59.43 | 70.10 |
| Omni-MATH | 74.46 | 64.77 | 62.42 | 61.09 | 72.02 |
| OptMATH | 57.68 | 35.99 | 35.84 | 39.16 | 42.77 |
| FinanceReasoning | 87.45 | 86.44 | 84.83 | 86.28 | 86.65 |
| BBEH | 47.34 | 42.86 | 34.83 | 39.75 | 29.08 |
| KOR-Bench | 76.00 | 73.76 | 73.20 | 70.56 | 59.68 |
| ARC-AGI-1 | 43.81 | 14.69 | 22.19 | 14.06 | 18.94 |
Code Performance (per inclusionAI model card)
| Benchmark | Ling-2.6-1T | DeepSeek-V3.1 | Kimi-K2-0905 | GPT-5-main | Gemini-2.5-Pro* |
|---|---|---|---|---|---|
| LiveCodeBench | 61.68 | 48.02 | 48.95 | 48.57 | 45.43 |
| MultiPL-E | 77.91 | 77.68 | 73.54 | 76.66 | 71.48 |
| CodeForces Rating | 1901 | 1582 | 1574 | 1120 | 1675 |
| FullStack Bench | 56.55 | 55.48 | 54.00 | 50.92 | 48.19 |
| ArtifactsBench | 59.31 | 43.29 | 44.87 | 41.04 | 60.28 |
| Aider Code Editing | 83.65 | 88.16 | 85.34 | 84.40 | 89.85 |
Agent Execution Benchmarks (per inclusionAI model card)
Ling-2.6-1T reaches open-source SOTA across agent-specific evaluations. Exact competitor scores are not published for all benchmarks; results listed as reported in the official model card.
| Benchmark | What It Measures | Ling-2.6-1T |
|---|---|---|
| SWE-bench Verified | Real-world GitHub issue resolution | Open-source SOTA |
| BFCLv4 | Complex multi-step function/tool calling | Open-source SOTA |
| TAU2-Bench | Long-horizon agent task completion | Open-source SOTA |
| Claw-Eval | Multi-turn command execution | Open-source SOTA |
| PinchBench | Composite agent capability | Open-source SOTA |
| IFBench | Complex instruction following | 56.9% |
Independent Benchmarks (Artificial Analysis)
| Metric | Ling-2.6-1T | Notes |
|---|---|---|
| AA Intelligence Index | 33.6 | Better than 73% of 495 models |
| AA Coding Index | 33.0 | Better than 78% of models |
| AA Agentic Index | 48.2 | Better than 80% of models |
| GPQA Diamond | 75.2% | Graduate-level scientific reasoning |
| τ²-Bench Telecom | 89.8% | Conversational agent tasks |
| IFBench | 56.9% | Instruction-following |
| Output Speed | 67.7 tok/s | Via Novita AI on OpenRouter |
How to Use Ling-2.6-1T backed by Novita AI
Option 1: Playground (No Code)
Try the model instantly at novita.ai/models/model-detail/inclusionai-ling-2.6-1t — no setup required. Useful for quickly testing prompts before integrating into your app.
Option 2: API (Python)
Ling-2.6-1T is fully OpenAI-compatible. Swap in your Novita API key and the model ID:
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="YOUR_NOVITA_API_KEY",
)
response = client.chat.completions.create(
model="inclusionai/ling-2.6-1t",
messages=[{"role": "user", "content": "Your prompt here"}],
temperature=0.7,
top_p=0.95,
)
print(response.choices[0].message.content)
Get your API key at novita.ai/settings. The model also supports streaming, function calling via tool_use, and structured output.
Option 3: Third-Party Tools
Since Novita AI is OpenAI-compatible, Ling-2.6-1T works with any tool that accepts a custom base URL — including Cursor, Claude Code, OpenWebUI, LangChain, and LlamaIndex. Set base URL to https://api.novita.ai/v3/openai and model to inclusionai/ling-2.6-1t.
Use Cases
Ling-2.6-1T’s combination of 1T-parameter capacity, fast-thinking paradigm, and 262K context makes it a strong fit for:
- Coding Agents: With a CodeForces rating of 1901 and strong LiveCodeBench scores, it handles competitive-level programming tasks. Pair it with Novita’s Agent Sandbox for fully isolated code execution without managing infrastructure.
- Financial Analysis: 87.45 on FinanceReasoning (#1 in its comparison group per inclusionAI model card) makes it suitable for automated report analysis, earnings summarization, and quantitative research workflows.
- Front-End Generation: The Hybrid Syntax–Function–Aesthetics reward in training specifically targets UI code quality. ArtifactsBench score of 59.31 is the second-highest in its comparison group — only 0.97 points behind Gemini-2.5-Pro(lowthink).
- Long-Document Processing: 262,144-token context handles multi-hundred-page documents, full repository analysis, or extended legal/research corpora in a single call.
- High-Volume Production APIs: Non-reasoning paradigm means predictable token counts and lower latency variance — important when you’re running thousands of requests per day.
Migrating From DeepSeek V3 or Kimi K2?
If you’re currently using DeepSeek V3 or Kimi K2 via another provider, switching to Ling-2.6-1T backed by Novita AI is a one-line change — same OpenAI-compatible API, same request format. The model ID becomes inclusionai/ling-2.6-1t.
On coding tasks, Ling-2.6-1T outperforms both DeepSeek-V3.1 and Kimi-K2-0905 on LiveCodeBench (61.68 vs 48.02 and 48.95), and on math reasoning it leads both on AIME26 and OptMATH. If your workloads are reasoning-heavy but you don’t want chain-of-thought verbosity, this is the cleaner upgrade path versus switching to a thinking model.
Pricing
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Context |
|---|---|---|---|
| Ling-2.6-1T (Novita AI) | $0.30 | $2.50 | 262,144 |
| DeepSeek V3.2 | $0.28 | $0.42 | 128K |
| Qwen3-235B-A22B | $0.455 | $1.82 | 131K |
| Kimi K2 (OpenRouter) | $0.57 | $2.30 | 131K |
Ling-2.6-1T’s output pricing ($2.50/M) is higher than DeepSeek V3.2 — the tradeoff is meaningfully stronger benchmark performance on reasoning and coding tasks. If token cost per call is the primary constraint, Ling-2.6-flash (104B params, 7.4B active) is the cheaper sibling and also exclusively available via Novita AI.
Free tier: Ling-2.6-1T is available for free via the inclusionai/ling-2.6-1t:free endpoint on OpenRouter, exclusively provided by Novita AI. This free window is time-limited — check current availability at openrouter.ai/inclusionai/ling-2.6-1t:free.
Conclusion
Bottom line: Ling-2.6-1T is currently the strongest open-weight non-reasoning model for competitive math and coding benchmarks, and the strongest open-source option if you need 262K context without paying for chain-of-thought verbosity. It’s not the cheapest option per token, but for complex reasoning tasks where thinking models would inflate your bill, it’s the most practical frontier open-source alternative available today.
Exclusively backed by Novita AI — the only provider offering both Ling-2.6-1T and Ling-2.6-flash on OpenRouter — you get a stable inference endpoint, 99.9% uptime, and OpenAI-compatible API without managing the 32-GPU minimum deployment yourself.
FAQ
What is Ling-2.6-1T?
Ling-2.6-1T is a 1-trillion-parameter Mixture-of-Experts language model developed by Ant Group (inclusionAI). It activates roughly 50B parameters per token, supports a 262,144-token context window, and is designed as a fast-thinking, non-reasoning model — strong benchmark performance without chain-of-thought overhead. MIT-licensed and fully open weights.
How do I access Ling-2.6-1T via API?
Set base_url="https://api.novita.ai/v3/openai" and model="inclusionai/ling-2.6-1t" in any OpenAI-compatible client. Get your API key at novita.ai/settings. It’s also accessible via OpenRouter using the same model ID.
How does Ling-2.6-1T compare to DeepSeek V3?
On self-reported benchmarks (inclusionAI model card), Ling-2.6-1T outperforms DeepSeek-V3.1 on AIME26 (70.42 vs 55.21), LiveCodeBench (61.68 vs 48.02), and ARC-AGI-1 (43.81 vs 14.69). DeepSeek V3.2 scores higher on the Artificial Analysis Intelligence Index (42 vs 34), but Ling-2.6-1T offers a larger context window (262K vs 128K) at similar pricing ($0.30/M input).
What is Ling-2.6-1T’s context window?
262,144 tokens (extended from 128K native via YaRN rope scaling). Maximum output length is 32,768 tokens.
Is Ling-2.6-1T free to use?
Yes, temporarily. The inclusionai/ling-2.6-1t:free endpoint on OpenRouter is provided exclusively by Novita AI. The free window is time-limited. The paid tier via Novita AI is $0.30/M input and $2.50/M output tokens.
Recommended Articles
- Ling-2.6-flash: 340 Tokens/s, ~7x Efficiency | Novita AI — The smaller sibling — when speed matters more than scale.
- Which Inference Provider Is Right for AI Agents — How to pick an inference API for agentic workloads.
- Top Inference API Providers for Open-Source Models in 2026 — Full comparison of who’s offering what for open-weight models.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





