Ling-2.6-1T on Novita AI: Free API, SWE-Bench SOTA, 1T Param Model

Most capable open-source models make you choose: raw intelligence or token efficiency. Thinking models burn 3–5× more tokens per request. Smaller non-reasoning models cut costs but cap capability. Ling-2.6-1T is built to break that tradeoff.

Ling-2.6-1T is a trillion-scale comprehensive flagship model from Ant Group (inclusionAI), designed for immediate task execution. Built on MLA + Hybrid Linear Attention architecture, it achieves a superior intelligence-to-token ratio: strong benchmark performance with minimal output token overhead. On AIME26, it significantly outperforms other non-thinking models. On agent execution benchmarks — SWE-bench Verified, BFCLv4, TAU2-Bench, Claw-Eval — it reaches open-source SOTA. Now exclusively backed by Novita AI as the inference provider.

In short: Ling-2.6-1T delivers comprehensive frontier capability for agent workloads — complex reasoning, tool use, multi-step execution, and long-context instruction following — at a fraction of the token cost of thinking models.

Try Ling-2.6-1T backed by Novita AI

Table Of Contents

What Is Ling-2.6-1T?
Key Features: Why Ling-2.6-1T Stands Out
Benchmark Performance
How to Use Ling-2.6-1T backed by Novita AI
Use Cases
Migrating From DeepSeek V3 or Kimi K2?
Pricing
Conclusion
FAQ
Recommended Articles

What Is Ling-2.6-1T?

Ling-2.6-1T is the latest flagship model from inclusionAI, the AI research arm of Ant Group (AntLingAGI). It’s a 1-trillion-parameter Mixture-of-Experts model — the largest FP8-trained foundation model released to date — trained on 20T+ high-quality tokens with over 40% reasoning-dense data in later stages.

Unlike thinking models (DeepSeek-R1, QwQ) that output long chain-of-thought traces before answering, Ling-2.6-1T uses a “fast thinking” mechanism: it internalizes reasoning without externalizing verbose thought chains. This keeps token output lean while maintaining strong analytical depth. ~50B parameters activate per token, making inference practical at 1T scale.

Architecture: MLA + Hybrid Linear Attention, 1T total parameters, ~50B active params per token
Context window: 262,144 tokens (via YaRN rope scaling), max output 32,768 tokens
Training: FP8 mixed-precision, 20T+ tokens, >40% reasoning-dense data
Paradigm: Fast-thinking — internalized reasoning, no verbose chain-of-thought output
License: MIT — fully open weights
Availability: Exclusively backed by Novita AI (OpenRouter provider)

Key Features: Why Ling-2.6-1T Stands Out

Superior Intelligence-to-Token Ratio

Thinking models produce impressive results but inflate your token bill — hundreds of reasoning tokens before the actual answer. Ling-2.6-1T was trained with Evolutionary Chain-of-Thought (Evo-CoT) in mid-training, internalizing reasoning rather than externalizing it. The result: strong benchmark scores on AIME26 (outperforming other non-thinking models), LiveCodeBench, and Omni-MATH — without paying for the thought process. Per the official model card, it achieves intelligence-output efficiency on par with GPT-5.4 (Non-Reasoning), representing a major leap over its predecessor Ling-1T. For high-throughput production workloads, this directly reduces cost.

Open-Source SOTA on Agent Execution

Agent workloads require more than math and coding in isolation — they require tool use, multi-step execution, and reliable instruction following under real-world conditions. Ling-2.6-1T reaches open-source SOTA across the key agent benchmarks (per inclusionAI model card):

SWE-bench Verified — real-world software engineering task resolution
BFCLv4 — Berkeley Function-Calling Leaderboard v4, complex tool-use
TAU2-Bench — long-horizon agentic task completion
Claw-Eval — multi-turn command execution
PinchBench — composite agent capability evaluation

On LiveCodeBench (Aug 2024–May 2025), it scores 61.68 — outperforming DeepSeek-V3.1 (48.02), Kimi-K2-0905 (48.95), and GPT-5-main (48.57) by 13+ points. For front-end generation, ArtifactsBench score is 59.31 — second only to Gemini-2.5-Pro(lowthink) at 60.28 in this comparison group (per inclusionAI model card).

Long Context + Instruction Following

With 262,144-token context (YaRN rope scaling), Ling-2.6-1T can hold entire codebases, long documents, or extended multi-turn agent conversations in a single call. On the MRCR benchmark (16K–256K context range), it consistently maintains retrieval accuracy — a critical requirement for agent pipelines that process long tool outputs or document corpora. IFBench score is 56.9%, demonstrating strong complex instruction-following under extended context.

Benchmark Performance

Independent measurements from Artificial Analysis place Ling-2.6-1T at an Intelligence Index of 33.6 — better than 73% of 495 measured models, and #2 in the open-weights large non-reasoning class. Below are self-reported scores from the inclusionAI model card (comparing against DeepSeek-V3.1-terminus, Kimi-K2-0905, GPT-5-main, and Gemini-2.5-Pro(lowthink)), followed by independently verified AA scores.

Math & Reasoning (per inclusionAI model card)

Benchmark	Ling-2.6-1T	DeepSeek-V3.1	Kimi-K2-0905	GPT-5-main	Gemini-2.5-Pro*
AIME26	70.42	55.21	50.16	59.43	70.10
Omni-MATH	74.46	64.77	62.42	61.09	72.02
OptMATH	57.68	35.99	35.84	39.16	42.77
FinanceReasoning	87.45	86.44	84.83	86.28	86.65
BBEH	47.34	42.86	34.83	39.75	29.08
KOR-Bench	76.00	73.76	73.20	70.56	59.68
ARC-AGI-1	43.81	14.69	22.19	14.06	18.94

*Gemini-2.5-Pro(lowthink). Source: inclusionAI model card. Last verified: 2026-04-24.

Code Performance (per inclusionAI model card)

Benchmark	Ling-2.6-1T	DeepSeek-V3.1	Kimi-K2-0905	GPT-5-main	Gemini-2.5-Pro*
LiveCodeBench	61.68	48.02	48.95	48.57	45.43
MultiPL-E	77.91	77.68	73.54	76.66	71.48
CodeForces Rating	1901	1582	1574	1120	1675
FullStack Bench	56.55	55.48	54.00	50.92	48.19
ArtifactsBench	59.31	43.29	44.87	41.04	60.28
Aider Code Editing	83.65	88.16	85.34	84.40	89.85

*Gemini-2.5-Pro(lowthink). Source: inclusionAI model card. Last verified: 2026-04-24. Note: model version names (e.g. “gpt-5-main”, “DeepSeek-V3.1-terminus”) are as reported by inclusionAI and may not correspond to publicly released versions.

Agent Execution Benchmarks (per inclusionAI model card)

Ling-2.6-1T reaches open-source SOTA across agent-specific evaluations. Exact competitor scores are not published for all benchmarks; results listed as reported in the official model card.

Benchmark	What It Measures	Ling-2.6-1T
SWE-bench Verified	Real-world GitHub issue resolution	Open-source SOTA
BFCLv4	Complex multi-step function/tool calling	Open-source SOTA
TAU2-Bench	Long-horizon agent task completion	Open-source SOTA
Claw-Eval	Multi-turn command execution	Open-source SOTA
PinchBench	Composite agent capability	Open-source SOTA
IFBench	Complex instruction following	56.9%

Source: inclusionAI model card. “Open-source SOTA” as claimed by inclusionAI; independent per-score data not yet available. Last verified: 2026-04-24.

Independent Benchmarks (Artificial Analysis)

Metric	Ling-2.6-1T	Notes
AA Intelligence Index	33.6	Better than 73% of 495 models
AA Coding Index	33.0	Better than 78% of models
AA Agentic Index	48.2	Better than 80% of models
GPQA Diamond	75.2%	Graduate-level scientific reasoning
τ²-Bench Telecom	89.8%	Conversational agent tasks
IFBench	56.9%	Instruction-following
Output Speed	67.7 tok/s	Via Novita AI on OpenRouter

Source: Artificial Analysis. Last verified: 2026-04-24.

How to Use Ling-2.6-1T backed by Novita AI

Option 1: Playground (No Code)

Try the model instantly at novita.ai/models/model-detail/inclusionai-ling-2.6-1t — no setup required. Useful for quickly testing prompts before integrating into your app.

Option 2: API (Python)

Ling-2.6-1T is fully OpenAI-compatible. Swap in your Novita API key and the model ID:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="YOUR_NOVITA_API_KEY",
)

response = client.chat.completions.create(
    model="inclusionai/ling-2.6-1t",
    messages=[{"role": "user", "content": "Your prompt here"}],
    temperature=0.7,
    top_p=0.95,
)

print(response.choices[0].message.content)

Get your API key at novita.ai/settings. The model also supports streaming, function calling via tool_use, and structured output.

Option 3: Third-Party Tools

Since Novita AI is OpenAI-compatible, Ling-2.6-1T works with any tool that accepts a custom base URL — including Cursor, Claude Code, OpenWebUI, LangChain, and LlamaIndex. Set base URL to https://api.novita.ai/v3/openai and model to inclusionai/ling-2.6-1t.

Use Cases

Ling-2.6-1T’s combination of 1T-parameter capacity, fast-thinking paradigm, and 262K context makes it a strong fit for:

Coding Agents: With a CodeForces rating of 1901 and strong LiveCodeBench scores, it handles competitive-level programming tasks. Pair it with Novita’s Agent Sandbox for fully isolated code execution without managing infrastructure.
Financial Analysis: 87.45 on FinanceReasoning (#1 in its comparison group per inclusionAI model card) makes it suitable for automated report analysis, earnings summarization, and quantitative research workflows.
Front-End Generation: The Hybrid Syntax–Function–Aesthetics reward in training specifically targets UI code quality. ArtifactsBench score of 59.31 is the second-highest in its comparison group — only 0.97 points behind Gemini-2.5-Pro(lowthink).
Long-Document Processing: 262,144-token context handles multi-hundred-page documents, full repository analysis, or extended legal/research corpora in a single call.
High-Volume Production APIs: Non-reasoning paradigm means predictable token counts and lower latency variance — important when you’re running thousands of requests per day.

Migrating From DeepSeek V3 or Kimi K2?

If you’re currently using DeepSeek V3 or Kimi K2 via another provider, switching to Ling-2.6-1T backed by Novita AI is a one-line change — same OpenAI-compatible API, same request format. The model ID becomes inclusionai/ling-2.6-1t.

On coding tasks, Ling-2.6-1T outperforms both DeepSeek-V3.1 and Kimi-K2-0905 on LiveCodeBench (61.68 vs 48.02 and 48.95), and on math reasoning it leads both on AIME26 and OptMATH. If your workloads are reasoning-heavy but you don’t want chain-of-thought verbosity, this is the cleaner upgrade path versus switching to a thinking model.

Pricing

Model	Input ($/1M tokens)	Output ($/1M tokens)	Context
Ling-2.6-1T (Novita AI)	$0.30	$2.50	262,144
DeepSeek V3.2	$0.28	$0.42	128K
Qwen3-235B-A22B	$0.455	$1.82	131K
Kimi K2 (OpenRouter)	$0.57	$2.30	131K

Novita AI pricing via novita.ai. Competitor pricing via OpenRouter. Last verified: 2026-04-24.

Ling-2.6-1T’s output pricing ($2.50/M) is higher than DeepSeek V3.2 — the tradeoff is meaningfully stronger benchmark performance on reasoning and coding tasks. If token cost per call is the primary constraint, Ling-2.6-flash (104B params, 7.4B active) is the cheaper sibling and also exclusively available via Novita AI.

Free tier: Ling-2.6-1T is available for free via the inclusionai/ling-2.6-1t:free endpoint on OpenRouter, exclusively provided by Novita AI. This free window is time-limited — check current availability at openrouter.ai/inclusionai/ling-2.6-1t:free.

Conclusion

Bottom line: Ling-2.6-1T is currently the strongest open-weight non-reasoning model for competitive math and coding benchmarks, and the strongest open-source option if you need 262K context without paying for chain-of-thought verbosity. It’s not the cheapest option per token, but for complex reasoning tasks where thinking models would inflate your bill, it’s the most practical frontier open-source alternative available today.

Exclusively backed by Novita AI — the only provider offering both Ling-2.6-1T and Ling-2.6-flash on OpenRouter — you get a stable inference endpoint, 99.9% uptime, and OpenAI-compatible API without managing the 32-GPU minimum deployment yourself.

Get Started with Ling-2.6-1T

FAQ

What is Ling-2.6-1T?

Ling-2.6-1T is a 1-trillion-parameter Mixture-of-Experts language model developed by Ant Group (inclusionAI). It activates roughly 50B parameters per token, supports a 262,144-token context window, and is designed as a fast-thinking, non-reasoning model — strong benchmark performance without chain-of-thought overhead. MIT-licensed and fully open weights.

How do I access Ling-2.6-1T via API?

Set base_url="https://api.novita.ai/v3/openai" and model="inclusionai/ling-2.6-1t" in any OpenAI-compatible client. Get your API key at novita.ai/settings. It’s also accessible via OpenRouter using the same model ID.

How does Ling-2.6-1T compare to DeepSeek V3?

On self-reported benchmarks (inclusionAI model card), Ling-2.6-1T outperforms DeepSeek-V3.1 on AIME26 (70.42 vs 55.21), LiveCodeBench (61.68 vs 48.02), and ARC-AGI-1 (43.81 vs 14.69). DeepSeek V3.2 scores higher on the Artificial Analysis Intelligence Index (42 vs 34), but Ling-2.6-1T offers a larger context window (262K vs 128K) at similar pricing ($0.30/M input).

What is Ling-2.6-1T’s context window?

262,144 tokens (extended from 128K native via YaRN rope scaling). Maximum output length is 32,768 tokens.

Is Ling-2.6-1T free to use?

Yes, temporarily. The inclusionai/ling-2.6-1t:free endpoint on OpenRouter is provided exclusively by Novita AI. The free window is time-limited. The paid tier via Novita AI is $0.30/M input and $2.50/M output tokens.

Ling-2.6-1T on Novita AI: Free API, SWE-Bench SOTA, 1T Param Model

What Is Ling-2.6-1T?