ERNIE 4.5, Baidu’s state-of-the-art open-weight Mixture-of-Experts (MoE) model, is now available on Novita AI!
Here’s the current ERNIE 4.5 pricing on Novita AI:
baidu/ernie-4.5-vl-28b-a3b: 30k context, free
baidu/erine-4.5–21B-a3b: 120k context, free
baidu/erine-4.5–0.3b: 120k context, free
baidu/erine-4.5-vl-424b-a47b: 123k context, $0.42/M tokens input, $1.25/M tokens output
baidu/ernie-4.5–300b-a47b-paddle: 123k context, $0.3/M tokens input, $1/M tokens output
What is ERNIE 4.5?
ERNIE 4.5 is Baidu’s latest open-source model series, comprising 10 different models. The lineup includes Mixture-of-Experts (MoE) models with 47 billion and 3 billion activated parameters—the largest model reaches a total of 424 billion parameters—as well as a dense model with 0.3 billion parameters.

Architectural Innovation: These models use an innovative multimodal heterogeneous model structure that achieves cross-modal knowledge fusion through a cross-modal parameter sharing mechanism, while preserving dedicated parameter spaces for individual modalities. This architecture is highly suitable for the continual pre-training paradigm from large language models to multimodal models, significantly enhancing multimodal understanding capabilities while maintaining or even improving text task performance.
Framework & Training: All ERNIE 4.5 series models are trained, inferred, and deployed efficiently using the PaddlePaddle deep learning framework. During large language model pre-training, the Model FLOPs Utilization (MFU) reaches 47%.
Performance & Capabilities
Benchmark Achievement: Experimental results show that this model series achieves state-of-the-art (SOTA) performance across multiple text and multimodal benchmarks, with particularly outstanding results in:

- Instruction following – Understanding and executing complex commands
- World knowledge retention – Comprehensive factual knowledge storage and recall
- Visual understanding – Advanced image comprehension capabilities
- Multimodal reasoning tasks – Complex reasoning across text and visual inputs
Model Specifications (ERNIE-4.5-300B-A47B):
- Total Parameters: 300B with 47B activated per token
- Architecture: 54 layers, 64 query heads / 8 key-value heads
- Expert Configuration: 64 text experts (8 activated) / 64 vision experts (8 activated)
- Context Length: 131,072 tokens
- Modality: Text with multimodal training capabilities
Accessibility & Deployment:
- Apache 2.0 license – Model weights are open-sourced for both academic research and industrial applications
- Industrial-grade development toolkit – Based on PaddlePaddle’s comprehensive suite with ERNIEKit support
- Broad chip compatibility – Works across various hardware platforms, lowering barriers for post-training and deployment
- Excellent inference performance – Multiple deployment options including FastDeploy, Transformers, and vLLM integration
- Flexible quantization – 4-bit, 2-bit, and FP8 options for different resource constraints
Technical Innovations
Multimodal Mixture of Experts Model Pre-training
The Approach: ERNIE 4.5 performs joint training across text and visual modalities to better capture subtle differences in multimodal information, improving performance in text generation, image understanding, and multimodal reasoning tasks.
The Innovation: To enable mutual enhancement between the two modalities during learning while preventing one modality from hindering another’s learning, Baidu proposes a multimodal heterogeneous mixture of experts model structure with:
- Modality-isolated routing for specialized expert allocation
- Router orthogonal loss to enhance expert specialization
- Multimodal token-balanced loss for optimal resource utilization across modalities
Advanced Optimization: These architectural choices ensure that both modalities are effectively represented, allowing for multimodal mutual promotion and improvement during training.
Efficient Training and Inference Framework
Training Optimizations: To support efficient training of ERNIE 4.5 models, Baidu proposes heterogeneous hybrid parallelism and hierarchical load balancing strategies. Through multiple advanced technologies, they significantly improve pre-training throughput:
- Intra-node expert parallelism – Optimized parallel processing within computing nodes
- Memory-efficient pipeline scheduling – Smart memory management during training
- FP8 mixed precision training – Advanced numerical precision techniques
- Fine-grained recomputation – Strategic recomputation for memory efficiency
Inference Breakthroughs: For inference optimization, they propose several cutting-edge methods:
- Multi-expert parallel collaboration method – Collaborative processing across model experts
- Convolutional code quantization algorithm – Advanced encoding techniques for compression
- Near-lossless quantization: Achieving 4-bit quantization and 2-bit quantization with minimal performance degradation
- PD disaggregation with dynamic role switching – Adaptive deployment that can more fully utilize resources and improve inference performance of ERNIE 4.5 MoE models
Modality-Specific Post-Training
Tailored Optimization: To meet different requirements in practical scenarios, Baidu performs modality-specific fine-tuning on the pre-trained models:
Large Language Models (LLMs):
- Optimized specifically for general language understanding and generation
Vision-Language Models (VLMs):
- Focus on visual-language understanding
- Support both thinking mode and non-thinking mode operations
Multi-stage Training Pipeline: Each model employs multi-stage post-training using advanced techniques:
- SFT (Supervised Fine-Tuning) – Learning from supervised examples
- DPO (Direct Preference Optimization) – Direct optimization based on preferences
- UPO (Unified Preference Optimization) – Baidu’s proprietary unified preference optimization technique
Deployment and Integration
ERNIE-4.5 models can be deployed using FastDeploy, Hugging Face Transformers, or vLLM. Different quantization levels and serving frameworks allow the models to run efficiently across a range of hardware setups:
- Full-precision models require many GPUs (typically 16 GPUs with at least 80GB VRAM each).
- Quantized models (like WINT4, W4A8C8, or WINT2) drastically reduce VRAM needs. For example, WINT4 or W4A8C8 can run on 4–8×80GB GPUs, while WINT2 enables single-GPU deployment if you have at least 141GB VRAM.
- Transformers integration allows for flexible use but still requires substantial VRAM for large models.
- vLLM is ideal for high-throughput, multi-GPU inference. Quantized models help fit within available GPU memory.
- Recommended sampling: Temperature=0.8, Top-P=0.8
How to Access ERNIE 4.5-300B-A47B on Novita AI
Getting started with ERNIE 4.5-300B-A47B on Novita AI is streamlined and risk-free. New users receive $10 in free credits—sufficient to explore ERNIE 4.5-300B-A47B without upfront costs.
Use the Playground (No Coding Required)
Instant Access: Sign up, claim your free credits, and start experimenting with ERNIE 4.5 and other top models in seconds.
Interactive UI: Test prompts, chain-of-thought reasoning, and visualize results in real time.
Model Comparison: Effortlessly switch between ERNIE 4.5, Qwen 3, Llama 4, DeepSeek, and more to find the perfect fit for your needs.
Integrate via API (For Developers)
Seamlessly connect ERNIE 4.5 to applications, workflows, or chatbots using Novita AI’s unified REST API. No model weight management or infrastructure concerns—Novita AI provides multi-language SDKs (Python, Node.js, cURL) and advanced parameter controls.
Option 1: Direct API Integration (Python Example)
curl "https://api.novita.ai/v3/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer " \
-d @- << 'EOF'
{
"model": "baidu/ernie-4.5-300b-a47b-paddle",
"messages": [
{
"role": "system",
"content": Be a helpful assistant
},
{
"role": "user",
"content": "Hi there!"
}
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="",
)
model = "baidu/ernie-4.5-300b-a47b-paddle"
stream = True # or False
max_tokens = 6000
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
"response_format": { "type": "text" },
"max_tokens": 32768,
"temperature": 1,
"top_p": 1,
"min_p": 0,
"top_k": 50,
"presence_penalty": 0,
"frequency_penalty": 0,
"repetition_penalty": 1
}
EOF
Option 2: Multi-Agent Workflows with OpenAI Agents SDK
Build advanced multi-agent systems by integrating Novita AI with the OpenAI Agents SDK:
- Plug-and-play: Use Novita AI’s RNIE 4.5 in any OpenAI Agents workflow
- Supports handoffs, routing, and tool use: Design agents that can delegate, triage, or run functions, all powered by RNIE 4.5’s capabilities
- Python integration: Simply point the SDK to Novita’s endpoint (
https://api.novita.ai/v3/openai) and use your API key
Connect ERNIE 4.5 API on Third-Party Platforms
- Hugging Face: Use QERNIE 4.5 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.
- Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.
- OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.
Conclusion
ERNIE 4.5 is a versatile, open-source AI model series that combines advanced Mixture-of-Experts architecture with innovative multimodal learning. It enables powerful, efficient performance across both language and vision tasks, making it a strong foundation for next-generation AI applications.
Ready to experience the future of AI reasoning? Try ERNIE 4.5 on Novita AI.
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





