BaiDu ERNIE 4.5 MoE Model API Now Live on Novita AI

ERNIE 4.5, Baidu’s state-of-the-art open-weight Mixture-of-Experts (MoE) model, is now available on Novita AI!

Here’s the current ERNIE 4.5 pricing on Novita AI:

baidu/ernie-4.5-vl-28b-a3b: 30k context, free

baidu/erine-4.5–21B-a3b: 120k context, free

baidu/erine-4.5–0.3b: 120k context, free

baidu/erine-4.5-vl-424b-a47b: 123k context, $0.42/M tokens input, $1.25/M tokens output

baidu/ernie-4.5–300b-a47b-paddle: 123k context, $0.3/M tokens input, $1/M tokens output

Try ERNIE-4.5-300B-A47B Demo Now

Table Of Contents

What is ERNIE 4.5?
Performance & Capabilities
Technical Innovations
Deployment and Integration
How to Access ERNIE 4.5-300B-A47B on Novita AI
Conclusion

What is ERNIE 4.5?

ERNIE 4.5 is Baidu’s latest open-source model series, comprising 10 different models. The lineup includes Mixture-of-Experts (MoE) models with 47 billion and 3 billion activated parameters—the largest model reaches a total of 424 billion parameters—as well as a dense model with 0.3 billion parameters.

Architectural Innovation: These models use an innovative multimodal heterogeneous model structure that achieves cross-modal knowledge fusion through a cross-modal parameter sharing mechanism, while preserving dedicated parameter spaces for individual modalities. This architecture is highly suitable for the continual pre-training paradigm from large language models to multimodal models, significantly enhancing multimodal understanding capabilities while maintaining or even improving text task performance.

Framework & Training: All ERNIE 4.5 series models are trained, inferred, and deployed efficiently using the PaddlePaddle deep learning framework. During large language model pre-training, the Model FLOPs Utilization (MFU) reaches 47%.

Performance & Capabilities

Benchmark Achievement: Experimental results show that this model series achieves state-of-the-art (SOTA) performance across multiple text and multimodal benchmarks, with particularly outstanding results in:

Instruction following – Understanding and executing complex commands
World knowledge retention – Comprehensive factual knowledge storage and recall
Visual understanding – Advanced image comprehension capabilities
Multimodal reasoning tasks – Complex reasoning across text and visual inputs

Model Specifications (ERNIE-4.5-300B-A47B):

Total Parameters: 300B with 47B activated per token
Architecture: 54 layers, 64 query heads / 8 key-value heads
Expert Configuration: 64 text experts (8 activated) / 64 vision experts (8 activated)
Context Length: 131,072 tokens
Modality: Text with multimodal training capabilities

Accessibility & Deployment:

Apache 2.0 license – Model weights are open-sourced for both academic research and industrial applications
Industrial-grade development toolkit – Based on PaddlePaddle’s comprehensive suite with ERNIEKit support
Broad chip compatibility – Works across various hardware platforms, lowering barriers for post-training and deployment
Excellent inference performance – Multiple deployment options including FastDeploy, Transformers, and vLLM integration
Flexible quantization – 4-bit, 2-bit, and FP8 options for different resource constraints

Technical Innovations

Multimodal Mixture of Experts Model Pre-training

The Approach: ERNIE 4.5 performs joint training across text and visual modalities to better capture subtle differences in multimodal information, improving performance in text generation, image understanding, and multimodal reasoning tasks.

The Innovation: To enable mutual enhancement between the two modalities during learning while preventing one modality from hindering another’s learning, Baidu proposes a multimodal heterogeneous mixture of experts model structure with:

Modality-isolated routing for specialized expert allocation
Router orthogonal loss to enhance expert specialization
Multimodal token-balanced loss for optimal resource utilization across modalities

Advanced Optimization: These architectural choices ensure that both modalities are effectively represented, allowing for multimodal mutual promotion and improvement during training.

Efficient Training and Inference Framework

Training Optimizations: To support efficient training of ERNIE 4.5 models, Baidu proposes heterogeneous hybrid parallelism and hierarchical load balancing strategies. Through multiple advanced technologies, they significantly improve pre-training throughput:

Intra-node expert parallelism – Optimized parallel processing within computing nodes
Memory-efficient pipeline scheduling – Smart memory management during training
FP8 mixed precision training – Advanced numerical precision techniques
Fine-grained recomputation – Strategic recomputation for memory efficiency

Inference Breakthroughs: For inference optimization, they propose several cutting-edge methods:

Multi-expert parallel collaboration method – Collaborative processing across model experts
Convolutional code quantization algorithm – Advanced encoding techniques for compression
Near-lossless quantization: Achieving 4-bit quantization and 2-bit quantization with minimal performance degradation
PD disaggregation with dynamic role switching – Adaptive deployment that can more fully utilize resources and improve inference performance of ERNIE 4.5 MoE models

Modality-Specific Post-Training

Tailored Optimization: To meet different requirements in practical scenarios, Baidu performs modality-specific fine-tuning on the pre-trained models:

Large Language Models (LLMs):

Optimized specifically for general language understanding and generation

Vision-Language Models (VLMs):

Focus on visual-language understanding
Support both thinking mode and non-thinking mode operations

Multi-stage Training Pipeline: Each model employs multi-stage post-training using advanced techniques:

SFT (Supervised Fine-Tuning) – Learning from supervised examples
DPO (Direct Preference Optimization) – Direct optimization based on preferences
UPO (Unified Preference Optimization) – Baidu’s proprietary unified preference optimization technique

Deployment and Integration

ERNIE-4.5 models can be deployed using FastDeploy, Hugging Face Transformers, or vLLM. Different quantization levels and serving frameworks allow the models to run efficiently across a range of hardware setups:

Full-precision models require many GPUs (typically 16 GPUs with at least 80GB VRAM each).
Quantized models (like WINT4, W4A8C8, or WINT2) drastically reduce VRAM needs. For example, WINT4 or W4A8C8 can run on 4–8×80GB GPUs, while WINT2 enables single-GPU deployment if you have at least 141GB VRAM.
Transformers integration allows for flexible use but still requires substantial VRAM for large models.
vLLM is ideal for high-throughput, multi-GPU inference. Quantized models help fit within available GPU memory.
Recommended sampling: Temperature=0.8, Top-P=0.8

How to Access ERNIE 4.5-300B-A47B on Novita AI

Getting started with ERNIE 4.5-300B-A47B on Novita AI is streamlined and risk-free. New users receive $10 in free credits—sufficient to explore ERNIE 4.5-300B-A47B without upfront costs.

Use the Playground (No Coding Required)

Instant Access: Sign up, claim your free credits, and start experimenting with ERNIE 4.5 and other top models in seconds.

Interactive UI: Test prompts, chain-of-thought reasoning, and visualize results in real time.

Model Comparison: Effortlessly switch between ERNIE 4.5, Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.

Integrate via API (For Developers)

Seamlessly connect ERNIE 4.5 to applications, workflows, or chatbots using Novita AI’s unified REST API. No model weight management or infrastructure concerns—Novita AI provides multi-language SDKs (Python, Node.js, cURL) and advanced parameter controls.

Option 1: Direct API Integration (Python Example)

curl "https://api.novita.ai/v3/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer " \
  -d @- << 'EOF'
{
    "model": "baidu/ernie-4.5-300b-a47b-paddle",
    "messages": [
        {
            "role": "system",
            "content": Be a helpful assistant
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="",
)

model = "baidu/ernie-4.5-300b-a47b-paddle"
stream = True # or False
max_tokens = 6000
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
      "response_format": { "type": "text" },
    "max_tokens": 32768,
    "temperature": 1,
    "top_p": 1,
    "min_p": 0,
    "top_k": 50,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "repetition_penalty": 1
}
EOF

Option 2: Multi-Agent Workflows with OpenAI Agents SDK

Build advanced multi-agent systems by integrating Novita AI with the OpenAI Agents SDK:

Plug-and-play: Use Novita AI’s RNIE 4.5 in any OpenAI Agents workflow
Supports handoffs, routing, and tool use: Design agents that can delegate, triage, or run functions, all powered by RNIE 4.5’s capabilities
Python integration: Simply point the SDK to Novita’s endpoint (https://api.novita.ai/v3/openai) and use your API key

Connect ERNIE 4.5 API on Third-Party Platforms

Hugging Face: Use QERNIE 4.5 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.

Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.

OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.

Conclusion

ERNIE 4.5 is a versatile, open-source AI model series that combines advanced Mixture-of-Experts architecture with innovative multimodal learning. It enables powerful, efficient performance across both language and vision tasks, making it a strong foundation for next-generation AI applications.

Ready to experience the future of AI reasoning? Try ERNIE 4.5 on Novita AI.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

BaiDu ERNIE 4.5 MoE Model API Now Live on Novita AI

What is ERNIE 4.5?

Performance & Capabilities

Technical Innovations

Multimodal Mixture of Experts Model Pre-training