Developers building agentic coding assistants face a critical choice: pay $3-15 per million output tokens for closed models like Claude Sonnet 4.5, or switch to open reasoning models that promise similar capabilities at a fraction of the cost. Qwen3-235B-A22B-Thinking-2507 from Alibaba challenges this trade-off by delivering reasoning performance with dedicated “thinking mode” — all at $0.30/$3.00 per 1M input/output tokens via Novita AI.
This guide walks through how to integrate Qwen3-235B-A22B-Thinking-2507 into Claude Code, the Anthropic-compatible terminal agent that enables agentic coding workflows. You’ll see how this 235B MoE model (22B active parameters per token) leverages Claude Code’s tool-rich environment to automate complex coding tasks with extended reasoning traces.
Does Qwen3-235B-A22B-Thinking-2507 Deliver Real Reasoning Power?
The Qwen3-235B-A22B-Thinking-2507 is the latest thinking-capable model in the Qwen3 lineup, offering major advances in reasoning ability. It excels in logical problem solving, mathematics, scientific analysis, coding, and academic evaluations—reaching or surpassing human-expert level performance and delivering competitive performance among open-source reasoning models. In addition to its reasoning strengths, it delivers improved general capabilities, including more accurate instruction following, advanced tool integration, highly natural text generation, and better alignment with human intent. The model also supports an extended 131K token context, enabling coherent and in-depth handling of long documents and complex discussions.
Architecture and Capabilities
| Technical Parameter | Specification | Description |
|---|---|---|
| Model Type | Causal Language Model | Based on Transformer architecture |
| Total Parameters | 235B | 22B activated parameters |
| Non-Embedding Parameters | 234B | Actual computational parameters |
| Number of Layers | 94 layers | Deep neural network structure |
| Attention Heads | Q: 64, KV: 4 | Uses GQA mechanism |
| Number of Experts | 128 | MoE architecture design |
| Activated Experts | 8 | Dynamic expert selection |
| Context Length | 262,144 tokens | Native long context support |
Benchmark Performance (Reasoning Tasks)

Qwen3-235B-A22B-Thinking-2507 excels in reasoning-heavy and knowledge-intensive tasks, particularly mathematics, multilingual knowledge, and document/video comprehension. Its performance is consistently competitive with larger models in complex cognitive and understanding benchmarks.
Cost and Token Efficiency
At $0.30 per 1M input tokens and $3.00 per 1M output tokens, Qwen3-235B-A22B-Thinking-2507 offers 90% cost savings on input and 80% savings on output compared to Claude Sonnet 4.5 ($3/$15 per 1M tokens). For extended reasoning tasks, the model can output up to 81K tokens — meaning a single complex task might cost $0.24 in output tokens, compared to $1.22 with Claude.

Why Qwen3-235B-A22B-Thinking-2507 Works Best with Claude Code
Claude Code is a terminal-based agentic coding interface published by Anthropic. It orchestrates multi-step workflows by invoking tools (file editing, bash commands, search), managing context across tasks, and iterating based on feedback. Qwen3-235B-A22B-Thinking-2507’s explicit reasoning traces align perfectly with this agentic paradigm — the model shows its planning steps before executing tool calls, making complex workflows debuggable and transparent.
1. Optimized for Agentic Interactions
Qwen3-235B-A22B-Thinking-2507 is designed to take actions, use tools, and manage multi-step tasks. Its thinking mode outputs structured reasoning chains that match Claude Code’s expectation of plan → execute → verify workflows. When the model plans a refactoring across 5 files, you see the step-by-step reasoning before any file edits occur.
2. Rich Toolchains and API Support
Claude Code provides pre-configured access to file system operations, bash execution, grep/search, git commands, and external tool integrations. Qwen3 models support tool calling schemas, JSON mode, and function definitions — enabling seamless invocation of Claude Code’s tool suite for tasks like automated testing, deployment scripts, and multi-file refactoring.
3. Real-Time Feedback Loops
The model’s thinking mode enables adaptive debugging: if a tool call fails (e.g., test suite errors), the reasoning trace shows what the model assumed, allowing you to correct misconceptions mid-session. This is critical for agentic workflows where early errors cascade across 20+ steps.
4. Extended Output for Complex Reasoning
Claude Code tasks like “refactor authentication flow across 8 files” or “debug memory leak with profiler integration” require multi-step plans with 10K+ token outputs. Qwen3-235B-A22B-Thinking-2507 supports up to 81K tokens for complex reasoning — far exceeding standard model limits — while keeping costs manageable ($0.24 per 81K output vs $1.22 for Claude).
How to Use Qwen3-235B-A22B-Thinking-2507 with Claude Code
Novita AI provides an Anthropic-compatible API endpoint, meaning Claude Code works with Qwen3-235B-A22B-Thinking-2507 via simple environment variable configuration — no code changes required. The model’s 256K context window and $0.30/$3.00 per 1M input/output token pricing make it ideal for extended coding sessions.
Prerequisites — Get Novita AI API Key
Step 1: Create a free account at Novita AI and log in.
Step 2: Navigate to Model Library and search for qwen/qwen3-235b-a22b-thinking-2507.
Step 3: Click Start Free Trial to activate access (Novita provides trial credits for new users).
Step 4: Go to Settings → API Keys and click Generate API Key. Copy the key.
Step 5: Verify the API connection with this Python test:
from openai import OpenAI
client = OpenAI(
api_key="<Your API Key>",
base_url="https://api.novita.ai/openai"
)
response = client.chat.completions.create(
model="qwen/qwen3-235b-a22b-thinking-2507",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
max_tokens=32768,
temperature=0.7
)
print(response.choices[0].message.content)
You should see the model’s response with reasoning traces enclosed in <think> tags.
Claude Code Setup Guide
Step 1: Installing Claude Code
#macOS, Linux, WSL: curl -fsSL https://claude.ai/install.sh | bash #Windows PowerShell: irm https://claude.ai/install.ps1 | iex #Windows CMD: curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd
Windows requires Git for Windows. Install it first if you don’t have it.
Step 2: Setting Up Environment Variables
Claude Code uses 4 environment variables to route API requests to Novita AI:
#For macOS/Linux (Bash/Zsh): # Set the Anthropic SDK compatible API endpoint provided by Novita. export ANTHROPIC_BASE_URL="https://api.novita.ai/anthropic" export ANTHROPIC_AUTH_TOKEN="<Novita API Key>" # Set the model provided by Novita. export ANTHROPIC_MODEL="qwen/qwen3-235b-a22b-thinking-2507" export ANTHROPIC_SMALL_FAST_MODEL="qwen/qwen3-235b-a22b-thinking-2507" #For Windows (PowerShell): $env:ANTHROPIC_BASE_URL = "https://api.novita.ai/anthropic" $env:ANTHROPIC_AUTH_TOKEN = "Novita API Key" $env:ANTHROPIC_MODEL = "qwen/qwen3-235b-a22b-thinking-2507" $env:ANTHROPIC_SMALL_FAST_MODEL = "qwen/qwen3-235b-a22b-thinking-2507"
Explanation:
ANTHROPIC_BASE_URL: Points Claude Code to Novita’s Anthropic-compatible endpointANTHROPIC_AUTH_TOKEN: Your Novita API key (not an Anthropic key)ANTHROPIC_MODEL: Primary model for complex tasks (thinking mode)ANTHROPIC_SMALL_FAST_MODEL: Fallback model for quick operations (set to same model if you want consistent reasoning behavior)
Step 3: Starting Claude Code
Navigate to your project directory and start Claude Code:
cd <your-project-directory> claude .
You’ll see the Claude Code prompt inside an interactive session. The model’s thinking mode activates automatically for complex queries.
Example task:
> Refactor the authentication module to use JWT tokens instead of sessions. Update all 5 related files and add unit tests.
Claude Code will analyze the request, invoke Qwen3-235B-A22B-Thinking-2507 to generate a multi-step plan (visible in <think> blocks), then execute file edits, write tests, and verify the changes.
Pro Tip: For math-heavy or algorithm design tasks, increase
max_tokensto 131072 in your API calls to leverage Qwen3-235B-A22B-Thinking-2507’s extended reasoning capacity. Set this via Claude Code’s config if it exposes token limits.
Qwen3-235B-A22B-Thinking-2507 delivers advanced reasoning, long-context handling, and structured multi-step planning at a fraction of the cost of closed models. Combined with Claude Code, it enables transparent, debuggable agentic coding workflows, making it a practical solution for developers seeking high-performance reasoning and coding automation without prohibitive token expenses.
Conclusion
Qwen3-235B-A22B-Thinking-2507 brings extended reasoning, transparent chain-of-thought output, and strong tool-use capabilities to Claude Code’s agentic workflow — at a fraction of the cost of closed models. For developers running complex coding tasks, the combination offers both performance and budget efficiency.
Key Takeaway: Set up four environment variables, point Claude Code at Novita AI’s Anthropic-compatible endpoint, and you’re running advanced reasoning workflows in minutes. Try Qwen3-235B-A22B-Thinking-2507 on Novita AI and start building today.
It’s a thinking-only model that outputs structured reasoning traces in <think> blocks before generating code, making complex agentic workflows transparent and debuggable. Unlike general instruction models, it’s optimized exclusively for reasoning-heavy tasks like competitive programming, algorithm design, and multi-step debugging.
Yes — it works with any tool supporting OpenAI-compatible APIs. Trae (GUI IDE), OpenCode (terminal agent), Cursor (code editor), and custom Python/Node.js scripts all support it via Novita AI’s https://api.novita.ai/v3/openai endpoint.
Yes — estimated 4×H100 80GB for FP8. For most developers, Novita AI’s API is more cost-effective than self-hosting unless you run 10,000+ tasks/month.
Recommended Reading
- Use GLM-4.5 in Trae to Unlock Smarter Coding Agents
- Use Codex CLI with Novita AI
- Use MiniMax M2.1 in OpenCode
Novita AI is an AI & agent cloud platform helping developers and startups build, deploy, and scale models and agentic applications with high performance, reliability, and cost efficiency.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





