English Arabic 简体中文 繁體中文 Français Deutsch 日本語 한국어 Português Русский Español
No other translations yet

ERNIE-4.5 Thinking: Baidu's 21B MoE Model Delivers 7x Faster Performance with Only 3B Active Parameters

ERNIE-4.5 Thinking: Baidu's 21B MoE Model Delivers 7x Faster Performance with Only 3B Active Parameters

ERNIE-4.5-21B-A3B-Thinking is now available on the Novita AI platform, bringing Baidu’s groundbreaking thinking capabilities to developers and businesses through our developer-friendly infrastructure. This latest release from Baidu represents a significant advancement in lightweight AI models, introducing enhanced reasoning depth and quality that sets it apart from previous generations.

With its efficient Mixture-of-Experts (MoE) architecture activating only 3B parameters per token from a total of 21B parameters, ERNIE-4.5-21B-A3B-Thinking delivers heavyweight performance with lightweight resource requirements.

Whether you’re developing complex reasoning applications, building mathematical solvers, or exploring advanced AI capabilities, ERNIE-4.5-21B-A3B-Thinking on Novita AI simplifies the development process with our optimized infrastructure and easy integration options.

Current pricing on Novita AI: 131,072 Context, $0.07/1M input tokens, $0.28/1M output tokens

Try ERNIE-4.5-21B-A3B-Thinking Demo

What is ERNIE-4.5-21B-A3B-Thinking?

ERNIE-4.5-21B-A3B-Thinking is a text-based Mixture of Experts (MoE) post-training model from Baidu’s groundbreaking ERNIE 4.5 series, which comprises 10 different models. This model represents a significant evolution in AI thinking capabilities, featuring 21B total parameters with only 3B activated per token.

The model introduces three key improvements over previous versions:

Enhanced Thinking Capabilities: ERNIE-4.5-21B-A3B-Thinking delivers significantly improved performance on reasoning tasks including logical reasoning, mathematics, science, coding, text generation, and academic benchmarks that typically require human expertise. The model features increased thinking length, making it particularly effective for highly complex reasoning tasks.

Efficient Tool Utilization: The model demonstrates exceptional capabilities in tool usage and function calling, making it ideal for agent-based applications. This enables seamless integration with external systems and APIs for real-world applications.

Extended Context Understanding: With its enhanced 128K long-context understanding capabilities (131,072 tokens), ERNIE-4.5-21B-A3B-Thinking can process extensive documents, codebases, and complex multi-turn conversations without losing context or accuracy.

ERNIE-4.5-21B-A3B-Thinking utilizes advanced post-training techniques including SFT (Supervised Fine-Tuning), DPO (Direct Preference Optimization), and Baidu’s proprietary UPO (Unified Preference Optimization). The model releases Transformer-style weights to align with the wider community, ensuring compatibility with both PyTorch and PaddlePaddle ecosystems, including vLLM and FastDeploy. This broad compatibility makes it easy to integrate into existing workflows while maintaining computational efficiency with just 80GB × 1 GPU requirement.

Explore ERNIE-4.5-21B-A3B-Thinking in Novita AI Playground →

Model Specifications

ERNIE-4.5-21B-A3B-Thinking employs a sophisticated Mixture-of-Experts architecture optimized for both performance and efficiency. The model’s design enables selective activation of the most relevant experts for each token, achieving an optimal balance between capability and computational cost.

Core Specifications:

  • Total Parameters: 21B
  • Activated Parameters: 3B per token
  • Layers: 28
  • Attention Heads: 20 query heads / 4 key-value heads
  • Text Experts: 64 total / 6 activated per token
  • Shared Experts: 2
  • Context Length: 131,072 tokens
  • Max Output: 65,536 tokens
  • Input/Output Capabilities: Text
  • Training Stage: Post-training
  • Provider: Baidu
  • License: Apache 2.0
  • Quantization: FP8
  • GPU Requirement: 80GB × 1 GPU
  • Reasoning Support: Yes

The MoE architecture of ERNIE-4.5-21B-A3B-Thinking represents a breakthrough in efficient AI design, inheriting innovations from the broader ERNIE 4.5 series including modality-isolated routing and router orthogonal loss techniques. By activating only 3B parameters per token while maintaining access to 21B total parameters, the model delivers enterprise-grade performance without the typical computational overhead.

The model’s 131,072 token context window and 65,536 token output capability enable processing of extensive documents and generating comprehensive responses, making it ideal for complex analytical tasks, long-form content generation, and detailed technical documentation.

Performance Highlights

ERNIE-4.5-21B-A3B-Thinking demonstrates exceptional performance across multiple domains, achieving state-of-the-art (SOTA) results as part of the ERNIE 4.5 family. The model’s enhanced thinking capabilities and improved reasoning depth make it particularly effective for tasks requiring multi-step analysis and complex problem-solving.

ERNIE-4.5-21B-A3B-Thinking benchmark

Key performance strengths include:

  • Logical Reasoning: ERNIE-4.5-21B-A3B-Thinking excels at complex logical deduction tasks, demonstrating superior performance in puzzles, syllogisms, and multi-step reasoning problems that require careful analysis and systematic thinking.
  • Mathematics: The model shows advanced mathematical problem-solving capabilities, handling everything from basic arithmetic to complex calculus, linear algebra, and abstract mathematical concepts with high accuracy.
  • Science: Enhanced scientific reasoning and analysis capabilities enable ERNIE-4.5-21B-A3B-Thinking to tackle problems in physics, chemistry, biology, and other scientific domains, providing detailed explanations and accurate solutions.
  • Coding: With improved code generation and debugging capabilities across multiple programming languages, the model can write, analyze, and optimize code while providing clear explanations of programming concepts and best practices.
  • Text Generation: High-quality natural language generation makes ERNIE-4.5-21B-A3B-Thinking ideal for creative writing, technical documentation, and content creation tasks requiring nuanced understanding and expression.
  • Academic Benchmarks: The model achieves competitive performance on benchmarks requiring human-level expertise, demonstrating its readiness for professional and academic applications.

Test ERNIE-4.5-21B-A3B-Thinking’s Capabilities in Novita AI Playground →

Getting Started with ERNIE-4.5-21B-A3B-Thinking on Novita AI Platform

Novita AI provides multiple pathways to access ERNIE-4.5-21B-A3B-Thinking, tailored to different technical expertise levels and use cases. Whether you’re a business user exploring AI capabilities or a developer building production applications, our platform offers the tools and flexibility you need.

Use the Playground (Available Now – No Coding Required)

The Novita AI playground offers the fastest way to experience ERNIE-4.5-21B-A3B-Thinking’s capabilities without any technical setup:

Instant Access: Sign up and start experimenting with ERNIE-4.5-21B-A3B-Thinking in seconds. No API keys or configuration required for initial testing.

Interactive Interface: Test prompts and visualize outputs in real-time with our intuitive web interface. Adjust parameters like temperature (0.7 default), max tokens (up to 65,536), and system prompts to see how they affect model behavior.

Model Configuration: Fine-tune response format, temperature, top-p, min-p, top-k, presence penalty, frequency penalty, and repetition penalty to optimize outputs for your specific use case.

The playground is perfect for prototyping, testing ideas, and understanding model capabilities before full implementation. Export your successful prompts and configurations directly to code for seamless transition to production.

Start Testing ERNIE-4.5-21B-A3B-Thinking in the Playground →

Integrate via API (Live and Ready – For Developers)

For production deployments, Novita AI offers robust API access to ERNIE-4.5-21B-A3B-Thinking with enterprise-grade reliability and performance through OpenAI-compatible endpoints.

Direct API Integration (Python Example)

Connect ERNIE-4.5-21B-A3B-Thinking to your applications using our OpenAI-compatible API:

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/openai",
    api_key="",
)

model = "baidu/ernie-4.5-21B-a3b-thinking"
stream = True # or False
max_tokens = 32768
system_content = "Be a helpful assistant"
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

Additional SDK support available for TypeScript, Java, Go, and Shell for seamless integration across different tech stacks.

Multi-Agent Workflows with OpenAI Agents SDK

Build sophisticated multi-agent systems that leverage ERNIE-4.5-21B-A3B-Thinking’s enhanced thinking capabilities:

  • Plug-and-Play Integration: Use ERNIE-4.5-21B-A3B-Thinking in any OpenAI Agents workflow without modification
  • Advanced Agent Capabilities: Full support for handoffs, routing, and tool integration for complex workflows
  • Function Calling: Leverage JsonSchema definitions for structured interactions and tool usage

Deployment Options

Novita AI offers flexible deployment options to match your specific requirements and usage patterns.

Serverless API

ERNIE-4.5-21B-A3B-Thinking is available via Novita’s serverless API for immediate access and pay-per-token pricing:

  • No setup required: Start using the model instantly without infrastructure management
  • Pay-per-use pricing: $0.07 per 1M input tokens, $0.28 per 1M output tokens
  • OpenAI-compatible endpoints: Drop-in replacement for existing OpenAI integrations
  • Automatic scaling: Handle variable workloads without capacity planning

On-demand Deployments

For high-volume or latency-sensitive applications, on-demand deployments provide dedicated resources:

  • High-performance serving stack: Optimized inference engine for maximum throughput
  • High reliability: Dedicated GPU resources ensure consistent performance
  • No rate limits: Scale according to your needs without artificial restrictions
  • GPU Requirements: 80GB VRAM (recommended: NVIDIA A100 80GB or H100 80GB for optimal performance)

Connect with Third-Party Platforms

ERNIE-4.5-21B-A3B-Thinking on Novita AI integrates seamlessly with your existing development ecosystem:

Development Tools: Direct integration with popular IDEs and development environments like Cursor, Cline, Continue, and Codex, Qwen Code through OpenAI-compatible APIs.

Orchestration Frameworks: Native support for LangChain, Dify, CrewAI, Langflow, and other AI orchestration platforms using official connectors.

Hugging Face Integration: As an official inference provider for Hugging Face, Novita AI ensures broad ecosystem compatibility and easy model deployment.

Conclusion

ERNIE-4.5-21B-A3B-Thinking on Novita AI represents a breakthrough in efficient AI reasoning, providing developers and organizations with access to Baidu’s most advanced thinking capabilities through our reliable, scalable platform.

The model’s unique combination of enhanced reasoning depth, efficient tool utilization, and 131K context understanding makes it the ideal choice for complex reasoning tasks. With only 3B activated parameters delivering 21B-parameter performance and requiring just 80GB GPU memory, ERNIE-4.5-21B-A3B-Thinking offers an unmatched balance of capability and efficiency.

Ready to experience the power of ERNIE-4.5-21B-A3B-Thinking? Our playground provides instant access with no setup required – perfect for exploring the model’s capabilities and testing your use cases. Start with our interactive interface to understand the model’s strengths, then seamlessly transition to API integration when you’re ready for production deployment.

Access ERNIE-4.5-21B-A3B-Thinking on Novita AI Playground Now →

Transform your projects with advanced AI reasoning at just $0.07 per million input tokens. Start exploring in seconds!

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.