Qwen 3.5 Medium Model Series on Novita AI: Frontier Intelligence at a Fraction of the Cost

Table Of Contents

What Is the Qwen 3.5 Medium Series?
Benchmark Performance
What Makes It Special on Novita AI
Cost Analysis: How Much Can You Save?
Use Cases and Best Practices
Which Model Should You Pick?
How to Get Started
Conclusion

Alibaba’s Qwen 3.5 Medium series brings frontier-level reasoning to open-source models you can actually afford to run in production. Three models — Qwen3.5-35B-A3B, Qwen3.5-27B, and Qwen3.5-122B-A10B — are now live on Novita AI, offering GPT-5-mini-class performance with the flexibility of open weights and Apache 2.0 licensing.

🎉All three models are already accessible through Novita AI’s serverless LLM API — no GPU provisioning required.

Try Qwen3.5-35B-A3B Now!

Try Qwen3.5-27B Now!

Try Qwen3.5-122B-A10B Now!

What Is the Qwen 3.5 Medium Series?

On February 24, 2026, Alibaba’s Qwen team released the Qwen 3.5 Medium model series — four models that sit between the flagship Qwen3.5-397B-A17B and smaller distilled variants. Three are open-weight under Apache 2.0, and all three are now available on Novita AI.

The series targets a specific gap in the market: models compact enough for cost-sensitive production workloads, yet powerful enough to rival proprietary frontier models like GPT-5 mini and Claude Sonnet 4.5.


Model	Total Params	Active Params	Architecture	Context
Qwen3.5-35B-A3B	35B	3B	MoE + Hybrid Attention	262K
Qwen3.5-27B	27B	27B (dense)	Dense + Hybrid Attention	262K
Qwen3.5-122B-A10B	122B	10B	MoE + Hybrid Attention	262K

Benchmark Performance

The Qwen 3.5 Medium models punch well above their weight class. Here’s how they stack up against GPT-5 mini across key categories (data from Qwen’s official benchmark results):

Knowledge & Reasoning


Benchmark	122B-A10B	27B	35B-A3B	GPT-5 mini
MMLU-Pro	86.7	86.1	85.3	83.7
GPQA Diamond	86.6	85.5	84.2	82.8
MMMLU	86.7	85.9	85.2	86.2
HMMT Feb 2025	91.4	92.0	89.0	89.2

Coding


Benchmark	122B-A10B	27B	35B-A3B	GPT-5 mini
SWE-bench Verified	72.0	72.4	69.2	72.0
Terminal-Bench 2	49.4	41.6	40.5	31.9
LiveCodeBench v6	78.9	80.7	74.6	80.5

Agentic Tasks


Benchmark	122B-A10B	27B	35B-A3B	GPT-5 mini
BFCL-V4 (Tool Use)	72.2	68.5	67.3	55.5
BrowseComp	63.8	61.0	61.0	48.1
TAU2-Bench	79.5	79.0	81.2	69.8

🔑 The standout: All three Medium models outperform GPT-5 mini on agentic tasks by 20–30%. On general knowledge and coding, they’re competitive or ahead. The key differentiator isn’t raw benchmark scores — it’s that you get this performance level with open weights, fine-tuning freedom, and no vendor lock-in.

What Makes It Special on Novita AI

Novita AI offers all three open-weight Qwen 3.5 Medium models via serverless API with OpenAI-compatible endpoints:


Model	Input	Output	Context	Max Output
Qwen3.5-35B-A3B	$0.25/Mt	$2.00/Mt	262K	65K
Qwen3.5-27B	$0.30/Mt	$2.40/Mt	262K	65K
Qwen3.5-122B-A10B	$0.40/Mt	$3.20/Mt	262K	65K

Key advantages on Novita AI:

OpenAI-compatible API: Drop-in replacement — switch from GPT endpoints with a base URL change.
Full 262K context: No truncation. Use the full native context window.
Serverless: No GPU provisioning, no cold starts to manage.

Want to see how these models handle your workload? Open the Playground and run a test →

Cost Analysis: How Much Can You Save?

Imagine you’re running an agentic coding workflow that processes 1M input tokens and generates 200K output tokens per day.


Model	Daily Input Cost	Daily Output Cost	Daily Total	Monthly (30d)
Qwen3.5-35B-A3B (Novita)	$0.25	$0.40	$0.65	$19.50
Qwen3.5-27B (Novita)	$0.30	$0.48	$0.78	$23.40
Qwen3.5-122B-A10B (Novita)	$0.40	$0.64	$1.04	$31.20
GPT-5 mini (OpenAI)	$0.25	$0.40	$0.65	$19.50
Claude Sonnet 4.5 (Anthropic)	$3.00	$3.00	$6.00	$180.00
GPT-5.2 (OpenAI)	$1.75	$2.80	$4.55	$136.50

Pricing sources: OpenAI (GPT-5 mini: $0.25/Mt input,$2.00/Mt output, GPT-5.2: $1.75/Mt input,$14.00/Mt output), Anthropic (Claude Sonnet $3.00/Mt input,$15.00/Mt output), Novita AI .

🔑 The real story: Qwen3.5-35B-A3B on Novita AI matches GPT-5 mini’s price point while delivering significantly stronger agentic performance (BFCL-V4: 67.3 vs 55.5). Compared to GPT-5.2 and Claude Sonnet 4.5, you save 7–9× on cost. The value proposition isn’t just price — it’s open weights at a closed-model price, with better agentic capabilities.

Use Cases and Best Practices

Suppose you’re building an AI-powered code review agent that scans pull requests, identifies issues, and suggests fixes. The agent needs to process entire repository contexts (50K–200K tokens), call external tools (linters, test runners), and generate structured feedback.

Here’s why Qwen 3.5 Medium is a strong fit:

1. Agentic Tool-Calling Pipelines

With a BFCL-V4 score of 72.2 (122B-A10B), these models excel at structured function calling. Build multi-step agents that chain API calls, parse responses, and make decisions — with reliability that exceeds GPT-5 mini by a wide margin.

2. Long-Context Code Analysis

The 262K native context window means you can feed entire codebases without chunking. The hybrid attention architecture keeps costs manageable even at high token counts.

3. Open-Weight Flexibility

Unlike GPT-5 mini, you can fine-tune Qwen 3.5 Medium on your own data, deploy it on-premise for compliance requirements, or run quantized versions locally on consumer GPUs (the 35B-A3B runs on 8GB+ VRAM with 4-bit quantization).

Which Model Should You Pick?

Quick guide:

Qwen3.5-35B-A3B → Best overall pick. Lowest cost, runs on consumer GPUs locally (8GB+ VRAM with quantization), and still beats GPT-5 mini on most benchmarks.
Qwen3.5-27B → Highest per-token reasoning density (all 27B params active). Best SWE-bench score (72.4) in the series. Ideal when you need maximum reasoning per forward pass.
Qwen3.5-122B-A10B → Top agentic scores across the board. Best for complex multi-step agent workflows where tool-calling accuracy is critical.

How to Get Started

Novita AI offers multiple ways to integrate the Qwen 3.5 Medium models into your workflow — from zero-code exploration to production API integration.

Try It in the Playground

Before writing any code, test the models interactively in the Novita AI Playground:

Switch on Thinking Mode to see the model’s internal reasoning chain.
Adjust parameters: Temperature and Top_p for controlling output creativity.
Stress-test long context: Paste documents up to 262K tokens to evaluate recall and comprehension.

New users signing up for a Novita AI account receive free trial credits — enough to run dozens of tests at no cost.

Step 1: Get Your API Key

Visit novita.ai and sign up or log in.
Navigate to API Keys in the dashboard.
Click Add New Key and copy it immediately — the key is shown only once.

Step 2: Call the API

Base URL:https://api.novita.ai/openai
Model IDs:qwen/qwen3.5-35b-a3b, qwen/qwen3.5-27b, qwen/qwen3.5-122b-a10b

Python Example:

from openai import OpenAI
client = OpenAI(
    api_key="<Your API Key>",
    base_url="https://api.novita.ai/openai"
)
response = client.chat.completions.create(
    model="qwen/qwen3.5-122b-a10b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ],
    max_tokens=65536,
    temperature=0.7
)
print(response.choices[0].message.content)

Step 3: Integrate with Your Tools

Since Novita AI follows the OpenAI/Anthropic API standard, Qwen 3.5 Medium models work seamlessly with the tools you already use:

Coding assistants: Cline, Cursor, OpenCode, Trae
Agent frameworks: LangChain, Langflow, Continue
Anthropic-compatible workflows: Claude Code
Chat UIs: AnythingLLM
Hugging Face Hub: Novita AI is listed as an Inference Provider for supported models.
Personal AI agents: OpenClaw — connect Qwen 3.5 Medium models to build always-on agents across messaging platforms.

Conclusion

A year ago, getting GPT-5-mini-level agentic performance from an open-weight model you could fine-tune and self-host was not realistic. The Qwen 3.5 Medium series changes that equation, particularly on tool use and multi-step agent workflows, where these models don’t just match proprietary alternatives but measurably exceed them.

For teams evaluating their model stack, the practical next step is straightforward: run your own prompts in the Playground, benchmark against your current provider, and decide based on your data — not ours.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Frequently Asked Questions

What’s the difference between Qwen3.5-Flash and Qwen3.5-35B-A3B?

Qwen3.5-Flash is the proprietary hosted version of the 35B-A3B, available only through Alibaba Cloud. It offers a 1M context window and built-in official tools. The open-weight 35B-A3B on Novita AI supports 262K context natively and is extensible up to 1M tokens.

Can I use these models for commercial applications?

Yes. All three open-weight models are released under the Apache 2.0 license — no restrictions on commercial use, fine-tuning, or redistribution.

How do these compare to the flagship Qwen3.5-397B-A17B?

The 397B flagship (also available on Novita AI at $0.60/Mt input,$3.60/Mt output) is stronger on competitive programming and some reasoning tasks. But the Medium models are surprisingly close — the 122B-A10B matches or exceeds it on agentic benchmarks, and the 35B-A3B delivers 85–95% of the flagship’s performance at less than half the cost.

Qwen 3.5 Medium Model Series on Novita AI: Frontier Intelligence at a Fraction of the Cost

What Is the Qwen 3.5 Medium Series?