Qwen3-235B-A22B-Thinking-2507 Now Available on Novita AI

Qwen3-235B-A22B-Thinking-2507 Now Available on Novita AI

Alibaba’s revolutionary Qwen3-235B-A22B-Thinking-2507 is now live on Novita AI.

This thinking model outperforms OpenAI O4-mini, Claude4 Opus, and other industry leaders on reasoning benchmarks at a fraction of the cost. With 92.3% on AIME25 and native 256K context support, it sets new standards for complex problem-solving. The model features 235B parameters (22B activated) with enhanced thinking capabilities for mathematical reasoning, coding, and analytical tasks.

Current pricing on Novita AI: 131072 Context, $0.3/M input tokens, $3/M output tokens.

What is Qwen3-235B-A22B-Thinking-2507?

Qwen3-235B-A22B-Thinking-2507 is an enhanced thinking version of Alibaba’s flagship 235B parameter model. After three months of continuous optimization, this model delivers significant improvements in reasoning depth, mathematical problem-solving, and complex analytical tasks.

The model builds on the Qwen3-235B-A22B architecture with specialized enhancements for thinking capabilities. It achieves state-of-the-art results among open-source thinking models across academic benchmarks.

Breakthrough Enhancements

Qwen3-235B-A22B-Thinking-2507 Comprehensive Benchmark Results

Revolutionary Thinking Improvements
Dramatic leaps in logical reasoning, mathematics, science, and coding. The model excels at academic benchmarks that typically require human expertise.

Enhanced General Capabilities
Markedly better instruction following, tool usage, and text generation. Improved alignment with human preferences while maintaining structured thinking processes.

Extended Context Mastery
256K long-context understanding maintains perfect coherence across entire documents, research papers, and extended reasoning chains.

Note: Extended thinking capabilities strongly recommended for highly complex reasoning tasks requiring deep analytical processing.

Key Features and Capabilities

    Technical Specifications

    • Type: Causal Language Models
    • Training Stage: Pretraining & Post-training
    • Total Parameters: 235B with 22B activated
    • Number of Paramaters (Non-Embedding): 234B
    • Architecture: 94 layers
    • Attention Heads (GQA): 64 for Q and 4 for KV
    • Experts: 128 total with 8 activated
    • Context Length: 262,144 tokens natively
    • Mode: Thinking mode only (automatic <think> tag inclusion)

    Performance Benchmarks

    Qwen3-235B-A22B-Thinking-2507 doesn’t just compete with industry leaders—it dominates them. This thinking model consistently outperforms premium models across comprehensive evaluation benchmarks.

    Comprehensive Performance Results

    Qwen3-235B-A22B-Thinking-2507 Comprehensive Benchmark Results
    Source from: Qwen Official huggingface

    Key Performance Highlights

    Mathematical Excellence
    Achieves 92.3% on AIME25, matching OpenAI O4-mini (92.7%) and surpassing all other models. Scores 83.9% on HMMT25, exceeding Gemini-2.5 Pro (82.5%) and significantly outperforming Claude4 Opus Thinking (58.3%).

    Superior Knowledge Understanding
    Reaches 81.1% on GPQA, matching Deepseek-R1 and OpenAI O4-mini levels. Achieves field-leading 64.9% on SuperGPQA, surpassing all competitors including Gemini-2.5 Pro (62.3%).

    Coding Leadership
    Dominates with 74.1% on LiveCodeBench v6, outperforming all models including Gemini-2.5 Pro (72.5%) and OpenAI O4-mini (71.8%). Achieves highest CFEval score of 2134 among all evaluated models.

    Reasoning Mastery
    Scores 78.4% on LiveBench, competitive with top models. Achieves 18.2% on HLE (text-only subset), approaching OpenAI O4-mini (18.1%) and surpassing previous Qwen3 version (11.8%).

    User Preference Alignment
    Achieves 79.7% on Arena-Hard v2, second only to OpenAI O3 (80.8%) and surpassing Deepseek-R1 (72.2%). Scores highest on WritingBench at 88.3%, exceeding all competitors.

    Multilingual Excellence
    Leads with 80.6% on MultiIF, surpassing most models except OpenAI O3 (80.3%). Achieves breakthrough 60.1% on PolyMATH, significantly outperforming all competitors including Gemini-2.5 Pro (52.2%).

    How to Access Qwen3-235B-A22B-Thinking-2507 on Novita AI

    Getting started with Qwen3-235B-A22B-Thinking-2507 on Novita AI is straightforward for developers and researchers.

    Use the Playground (No Coding Required)

    Instant Access: Sign up and start experimenting with Qwen3-235B-A22B-Thinking-2507 in seconds.

    Interactive Interface: Test complex reasoning prompts and visualize structured outputs in real-time.

    Model Comparison: Compare Qwen3-235B-A22B-Thinking-2507 with other leading models for your specific use case.

    Integrate via API (For Developers)

    Connect Qwen3-235B-A22B-Thinking-2507 to your applications with Novita AI’s unified REST API.

    Option 1: Direct API Integration (Python Example)

    from openai import OpenAI
      
    client = OpenAI(
        base_url="https://api.novita.ai/v3/openai",
        api_key="session_CnjsV-lI2tqhCbXYRCfnfGoFieN4Ubn2A-5n07_AE0vOcfoffz0egjxrlijiCdtsOlBLaPuCbLNhDmP3naR3Dg==",
    )
    
    model = "qwen/qwen3-235b-a22b-thinking-2507"
    stream = True # or False
    max_tokens = 65536
    system_content = ""Be a helpful assistant""
    temperature = 1
    top_p = 1
    min_p = 0
    top_k = 50
    presence_penalty = 0
    frequency_penalty = 0
    repetition_penalty = 1
    response_format = { "type": "text" }
    
    chat_completion_res = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "system",
                "content": system_content,
            },
            {
                "role": "user",
                "content": "Hi there!",
            }
        ],
        stream=stream,
        max_tokens=max_tokens,
        temperature=temperature,
        top_p=top_p,
        presence_penalty=presence_penalty,
        frequency_penalty=frequency_penalty,
        response_format=response_format,
        extra_body={
          "top_k": top_k,
          "repetition_penalty": repetition_penalty,
          "min_p": min_p
        }
      )
    
    if stream:
        for chunk in chat_completion_res:
            print(chunk.choices[0].delta.content or "", end="")
    else:
        print(chat_completion_res.choices[0].message.content)
      
    

    Key Features:

    • OpenAI-Compatible API for seamless integration
    • Flexible parameter control for fine-tuning
    • Streaming support for real-time responses

    Option 2: Multi-Agent Workflows with OpenAI Agents SDK

    Build sophisticated multi-agent systems using Qwen3-235B-A22B-Instruct-2507:

    • Plug-and-Play Integration: Use Novita AI’s models in any OpenAI Agents workflow
    • Advanced Agent Capabilities: Support for handoffs, routing, and tool integration
    • Scalable Architecture: Design agents that can delegate tasks and run complex functions

    Connect with Third-Party Platforms

    Development Tools: Seamlessly integrate with popular IDEs and development environments like Cursor, Trae and Cline through OpenAI-compatible APIs.

    Orchestration Frameworks: Connect with LangChain, Dify, Langflow, and other AI orchestration platforms using official connectors.

    Hugging Face Integration: Use Qwen3-235B-A22B-Instruct-2507 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.

    Best Practices for Optimal Performance

    Follow these official Qwen team guidelines for optimal performance.

    Recommended Sampling Parameters

    • Temperature: 0.6
    • TopP: 0.95
    • TopK: 20
    • MinP: 0

    Adjust presence_penalty between 0-2 to reduce repetitions. Higher values may occasionally cause language mixing.

    Output Length Guidelines

    • Standard queries: 32,768 tokens
    • Complex problems: 81,920 tokens for math and programming competitions
    • Sufficient space ensures detailed thinking processes and comprehensive responses

    Standardize Output Format

    • Math Problems: Add “Please reason step by step, and put your final answer within \boxed{}.”
    • Multiple-Choice: Include JSON structure: “Please show your choice in the answer field with only the choice letter, e.g., “answer”: “C”.”

    Multi-Turn Conversations

    Historical outputs should exclude thinking content. Only final responses belong in conversation history. The Jinja2 chat template handles this automatically.

    Conclusion

    Qwen3-235B-A22B-Thinking-2507 proves open-source AI can match commercial thinking models. With 92.3% on AIME25 and 74.1% on LiveCodeBench, it rivals OpenAI O4-mini and Claude4 Opus at a fraction of the cost. The 256K context window and enhanced thinking architecture excel at complex reasoning tasks. At $0.15/M input tokens on Novita AI, it democratizes access to state-of-the-art AI reasoning.

    Try Qwen3-235B-A22B-Thinking-2507 on Novita AI today.

    Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


    Discover more from Novita

    Subscribe to get the latest posts sent to your email.

    Leave a Comment

    Scroll to Top

    Discover more from Novita

    Subscribe now to keep reading and get access to the full archive.

    Continue reading