ERNIE 4.5 is Baidu’s advanced AI model family for powerful text and multimodal processing. With options for both large-scale and lightweight deployment, ERNIE 4.5 offers efficient performance and cost-effective access for developers and businesses. Whether you’re working with text, images, or both, ERNIE 4.5 can be accessed easily through web interfaces, APIs, and cloud platforms—no complex setup required.
Simple Introduction to ERNIE 4.5
ERNIE 4.5 is a family of advanced AI models developed by Baidu, focusing on efficient multimodal and text-based processing. These models utilize Mixture of Experts (MoE) architectures for larger variants and dense architectures for smaller ones. They support text and vision modalities, with options for pre-training (PT) and base versions. Below is a table of the key model variants and a diagram highlighting ERNIE’s innovations in AI training flow.
| Model | Base | Active | Type | Modality | Train |
|---|---|---|---|---|---|
| ERNIE 4.5 VL 424B A47B | 424B | 47B | MoE | T+V | PT |
| ERNIE 4.5 VL 424B A47B Base | 424B | 47B | MoE | T+V | Base |
| ERNIE 4.5 VL 28B A3B | 28B | 3B | MoE | T+V | PT |
| ERNIE 4.5 VL 28B A3B Base | 28B | 3B | MoE | T+V | Base |
| ERNIE 4.5 VL 28B A3B Thinking | 28B | 3B | MoE | T+V | PT |
| ERNIE 4.5 300B A47B | 300B | 47B | MoE | Text | PT |
| ERNIE 4.5 300B A47B Base | 300B | 47B | MoE | Text | Base |
| ERNIE 4.5 21B A3B | 21B | 3B | MoE | Text | PT |
| ERNIE 4.5 21B A3B Base | 21B | 3B | MoE | Text | Base |
| ERNIE 4.5 21B A3B Thinking | 21B | 3B | MoE | Text | PT |
| ERNIE 4.5 0.3B | 0.3B | – | Dense | Text | PT |
| ERNIE 4.5 0.3B Base | 0.3B | – | Dense | Text | Base |
AI Training Flow: ERNIE Innovations Highlighted
1. Multimodal Heterogeneous MoE Pre-Training
Joint text & vision pre-training with heterogeneous MoE structure, modality-isolated routing, and balanced multimodal loss.
2. Scaling-Efficient Infrastructure
Hybrid parallelism, hierarchical load balancing, expert parallelism, memory-optimized scheduling, and lossless quantization for high throughput and efficient inference.
3. Modality-Specific Post-Training
Fine-tuning for text or vision tasks, supporting SFT, DPO, and UPO to meet diverse real-world application needs.
Performance Comparison: ERNIE 4.5 vs. GPT-4o
ERNIE 4.5 delivers superior performance and exceptional cost efficiency compared to GPT-4o, making it a highly competitive choice for large-scale AI deployments. This is price on Novita AI!
- ERNIE 4.5 VL 424B A47B
- $0.336 per 1M input tokens
- $1 per 1M output tokens
- ERNIE 4.5 300B A47
- $0.224 per 1M input tokens
- $0.88 per 1M output tokens
- ERNIE 4.5 21B A3B / ERNIE-4.5-21B-A3B-Thinking
- $0.056 per 1M input tokens
- $0.224 per 1M output tokens
- ERNIE 4.5 VL 28B A3B
- $0.112 per 1M input tokens
- $0.448 per 1M output tokens
- ERNIE-4.5-VL-28B-A3B-Thinking
- $0.39 per 1M input tokens
- $0.39 per 1M output tokens

Access ERNIE 4.5 Through Baidu Platform(Free Trail)
You can try it out directly through the Baidu platform’s web interface, with no installation required. Simply visit the website and start your free trial instantly.

Alternatively, you can use the Novita API Playground to experiment with ERNIE 4.5 in a developer-friendly environment.

Access ERNIE 4.5 Locally
What are the system requirements for using Ernie 4.5?
FP16 Precision
| Model | Parameters (Active) | VRAM Needed | Ideal GPU(s) |
|---|---|---|---|
| ERNIE 4.5 VL 424B | 424B (47B active) | ~945 GB | NVIDIA H100 (80GB) × 12 |
| ERNIE 4.5 300B | 300B (47B active) | ~668 GB | NVIDIA H100 (80GB) × 9 |
| ERNIE 4.5 VL 28B / ERNIE 4.5 VL 28B A3B Thinking | 28B (3B active) | ~80 GB | NVIDIA A100/H100 (80GB) |
| ERNIE 4.5 21B / ERNIE 4.5 21B A3B Thinking | 21B (3B active) | ~48GB | NVIDIA RTX 4090 (24GB)X2 |
| ERNIE 4.5 0.3B | 300M | ~2.5 GB | NVIDIA RTX 4090 (8GB) / RTX 3060 (12GB) |
INT4 Precision
| Model | Parameters (Active) | VRAM Needed | Ideal GPU(s) |
|---|---|---|---|
| ERNIE 4.5 VL 424B | 424B (47B active) | ~237 GB | NVIDIA H100 (80GB) × 3 |
| ERNIE 4.5 300B | 300B (47B active) | ~168 GB | NVIDIA H100 (80GB) × 3 |
| ERNIE 4.5 VL 28B / ERNIE 4.5 VL 28B A3B Thinking | 28B (3B active) | ~17 GB | NVIDIA RTX 4090 (24GB) / A10G (24GB) |
| ERNIE 4.5 21B / ERNIE 4.5 21B A3B Thinking | 21B (3B active) | ~13 GB | NVIDIA RTX 4080 (16GB) / A10G (24GB) |
| ERNIE 4.5 0.3B | 300M | ~1.8 GB | Most GPUs with >4GB VRAM |
Based on the official ERNIEToolkit and open-source release:
- OS: Linux is strongly recommended (Ubuntu or similar).
- Framework: PaddlePaddle (latest version) required.
- For inference/training: use ERNIEKit (based on PaddlePaddle).
- Deployment can be accelerated with FastDeploy.
- Dependencies:
- Python 3.8+
- CUDA and cuDNN matching your GPU setup.
- For PyTorch environment: models are also available via
transformerswithtrust_remote_code=True
If purchasing a GPU seems too costly, you can take advantage of Novita AI’s cost-effective and reliable cloud GPU services. For instance, you can access a 1x H100 SXM 80GB instance with 80 GB VRAM for just $2.56 per hour, or scale up to 8 GPUs for $20.48 per hour.
Access ERNIE 4.5 from Python Application
- Hugging Face: Use QERNIE 4.5 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.
- Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.
- OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.

Access ERNIE 4.5 via API
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.

Step 2: Choose Your Model
Browse through the available options and select the model that suits your needs.

Step 3: Start Your Free Trial
Begin your free trial to explore the capabilities of the selected model.

Step 4: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 5: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="",
)
model = "baidu/ernie-4.5-300b-a47b-paddle"
stream = True # or False
max_tokens = 6000
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Accessing ERNIE 4.5 is flexible and straightforward—choose the approach that fits your workflow, from instant web trials to robust API integration and local deployment. With superior performance and efficient pricing, ERNIE 4.5 is a practical choice for next-generation AI applications.
Frequently Asked Questions
Yes, ERNIE 4.5 scores higher than DeepSeek V3 671B in most benchmarks and is very competitive with other top models.
Requirements vary by model size, but you’ll need a Linux system, Python 3.8+, PaddlePaddle, and a compatible NVIDIA GPU. Cloud GPU options are available if you don’t have local hardware.
Running the largest versions of ERNIE 4.5 (like 424B or 300B) requires very high VRAM—hundreds of GBs and multiple high-end GPUs. Smaller or quantized versions need much less VRAM.
Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.
Recommend Reading
- Qwen3 Reranker 8B Now Available on Novita AI: Enhances AI Search Accuracy
- GLM 4.1V 9B Thinking vs Qwen2.5 VL 72B: Which Fits What?
- Qwen3 Embedding 8B: Powerful Search, Flexible Customization, and Multilingual
Discover more from Novita
Subscribe to get the latest posts sent to your email.





