Qwen3-235B-A22B-Thinking-2507 Now Available on Novita AI

Table Of Contents

What is Qwen3-235B-A22B-Thinking-2507?
Breakthrough Enhancements
Key Features and Capabilities
How to Access Qwen3-235B-A22B-Thinking-2507 on Novita AI
Best Practices for Optimal Performance
Conclusion

Alibaba’s revolutionary Qwen3-235B-A22B-Thinking-2507 is now live on Novita AI.

This thinking model outperforms OpenAI O4-mini, Claude4 Opus, and other industry leaders on reasoning benchmarks at a fraction of the cost. With 92.3% on AIME25 and native 256K context support, it sets new standards for complex problem-solving. The model features 235B parameters (22B activated) with enhanced thinking capabilities for mathematical reasoning, coding, and analytical tasks.

Current pricing on Novita AI: 131072 Context, $0.3/M input tokens, $3/M output tokens.

Try 235B-A22B-Thinking-2507 Demo

What is Qwen3-235B-A22B-Thinking-2507?

Qwen3-235B-A22B-Thinking-2507 is an enhanced thinking version of Alibaba’s flagship 235B parameter model. After three months of continuous optimization, this model delivers significant improvements in reasoning depth, mathematical problem-solving, and complex analytical tasks.

The model builds on the Qwen3-235B-A22B architecture with specialized enhancements for thinking capabilities. It achieves state-of-the-art results among open-source thinking models across academic benchmarks.

Breakthrough Enhancements

Revolutionary Thinking Improvements
Dramatic leaps in logical reasoning, mathematics, science, and coding. The model excels at academic benchmarks that typically require human expertise.

Enhanced General Capabilities
Markedly better instruction following, tool usage, and text generation. Improved alignment with human preferences while maintaining structured thinking processes.

Extended Context Mastery
256K long-context understanding maintains perfect coherence across entire documents, research papers, and extended reasoning chains.

Note: Extended thinking capabilities strongly recommended for highly complex reasoning tasks requiring deep analytical processing.

Key Features and Capabilities

Technical Specifications

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Total Parameters: 235B with 22B activated
Number of Paramaters (Non-Embedding): 234B
Architecture: 94 layers
Attention Heads (GQA): 64 for Q and 4 for KV
Experts: 128 total with 8 activated
Context Length: 262,144 tokens natively
Mode: Thinking mode only (automatic <think> tag inclusion)

Performance Benchmarks

Qwen3-235B-A22B-Thinking-2507 doesn’t just compete with industry leaders—it dominates them. This thinking model consistently outperforms premium models across comprehensive evaluation benchmarks.

Comprehensive Performance Results

Source from: Qwen Official huggingface

Key Performance Highlights

Mathematical Excellence
Achieves 92.3% on AIME25, matching OpenAI O4-mini (92.7%) and surpassing all other models. Scores 83.9% on HMMT25, exceeding Gemini-2.5 Pro (82.5%) and significantly outperforming Claude4 Opus Thinking (58.3%).

Superior Knowledge Understanding
Reaches 81.1% on GPQA, matching Deepseek-R1 and OpenAI O4-mini levels. Achieves field-leading 64.9% on SuperGPQA, surpassing all competitors including Gemini-2.5 Pro (62.3%).

Coding Leadership
Dominates with 74.1% on LiveCodeBench v6, outperforming all models including Gemini-2.5 Pro (72.5%) and OpenAI O4-mini (71.8%). Achieves highest CFEval score of 2134 among all evaluated models.

Reasoning Mastery
Scores 78.4% on LiveBench, competitive with top models. Achieves 18.2% on HLE (text-only subset), approaching OpenAI O4-mini (18.1%) and surpassing previous Qwen3 version (11.8%).

User Preference Alignment
Achieves 79.7% on Arena-Hard v2, second only to OpenAI O3 (80.8%) and surpassing Deepseek-R1 (72.2%). Scores highest on WritingBench at 88.3%, exceeding all competitors.

Multilingual Excellence
Leads with 80.6% on MultiIF, surpassing most models except OpenAI O3 (80.3%). Achieves breakthrough 60.1% on PolyMATH, significantly outperforming all competitors including Gemini-2.5 Pro (52.2%).

How to Access Qwen3-235B-A22B-Thinking-2507 on Novita AI

Getting started with Qwen3-235B-A22B-Thinking-2507 on Novita AI is straightforward for developers and researchers.

Use the Playground (No Coding Required)

Instant Access: Sign up and start experimenting with Qwen3-235B-A22B-Thinking-2507 in seconds.

Interactive Interface: Test complex reasoning prompts and visualize structured outputs in real-time.

Model Comparison: Compare Qwen3-235B-A22B-Thinking-2507 with other leading models for your specific use case.

Integrate via API (For Developers)

Connect Qwen3-235B-A22B-Thinking-2507 to your applications with Novita AI’s unified REST API.

Option 1: Direct API Integration (Python Example)

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="session_CnjsV-lI2tqhCbXYRCfnfGoFieN4Ubn2A-5n07_AE0vOcfoffz0egjxrlijiCdtsOlBLaPuCbLNhDmP3naR3Dg==",
)

model = "qwen/qwen3-235b-a22b-thinking-2507"
stream = True # or False
max_tokens = 65536
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Key Features:

OpenAI-Compatible API for seamless integration
Flexible parameter control for fine-tuning
Streaming support for real-time responses

Option 2: Multi-Agent Workflows with OpenAI Agents SDK

Build sophisticated multi-agent systems using Qwen3-235B-A22B-Instruct-2507:

Plug-and-Play Integration: Use Novita AI’s models in any OpenAI Agents workflow
Advanced Agent Capabilities: Support for handoffs, routing, and tool integration
Scalable Architecture: Design agents that can delegate tasks and run complex functions

Connect with Third-Party Platforms

Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor, Trae and Cline through OpenAI-compatible APIs.

Orchestration Frameworks: Connect with LangChain, Dify, Langflow, and other AI orchestration platforms using official connectors.

Hugging Face Integration: Use Qwen3-235B-A22B-Instruct-2507 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.

Best Practices for Optimal Performance

Follow these official Qwen team guidelines for optimal performance.

Recommended Sampling Parameters

Temperature: 0.6
TopP: 0.95
TopK: 20
MinP: 0

Adjust presence_penalty between 0-2 to reduce repetitions. Higher values may occasionally cause language mixing.

Output Length Guidelines

Standard queries: 32,768 tokens
Complex problems: 81,920 tokens for math and programming competitions
Sufficient space ensures detailed thinking processes and comprehensive responses

Standardize Output Format

Math Problems: Add “Please reason step by step, and put your final answer within \oxed{}.”
Multiple-Choice: Include JSON structure: “Please show your choice in the answer field with only the choice letter, e.g., “answer”: “C”.”

Multi-Turn Conversations

Historical outputs should exclude thinking content. Only final responses belong in conversation history. The Jinja2 chat template handles this automatically.

Conclusion

Qwen3-235B-A22B-Thinking-2507 proves open-source AI can match commercial thinking models. With 92.3% on AIME25 and 74.1% on LiveCodeBench, it rivals OpenAI O4-mini and Claude4 Opus at a fraction of the cost. The 256K context window and enhanced thinking architecture excel at complex reasoning tasks. At $0.15/M input tokens on Novita AI, it democratizes access to state-of-the-art AI reasoning.

Try Qwen3-235B-A22B-Thinking-2507 on Novita AI today.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Qwen3-235B-A22B-Thinking-2507 Now Available on Novita AI

What is Qwen3-235B-A22B-Thinking-2507?

Breakthrough Enhancements