DeepSeek V3 vs. Qwen 2.5 72B: Precision vs. Multilingual Efficiency
By
Novita AI
/ March 18, 2025 / LLM / 7 minutes of reading
Key Highlights
✅ Training Methods: DeepSeek V3: Pre-training → SFT → RL for dynamic adaptability. Qwen 2.5: Domain-specific pretraining (e.g., code, math). ✅ Performance: DeepSeek leads in coding (36% vs. 28%), math (89% vs. 86%), and reasoning benchmarks. Qwen excels in multilingual tasks (29 languages vs. 3). ✅ Cost & Speed: Qwen: Lower cost ($0.38/M input tokens) and faster output. DeepSeek Turbo: 3× throughput + 20% discount for high-volume needs on Novita AI.
If you’re looking to evaluate the DeepSeek V3 and Qwen 2.5 72B on your own use-cases — Upon registration, Novita AI provides a $0.5 credit to get you started!
The battle between MoE (Mixture of Experts) language models intensifies with DeepSeek V3 (Dec 2024) and Qwen 2.5 72B (Sep 2024). While DeepSeek targets technical precision and dynamic interaction, Qwen prioritizes multilingual efficiency and cost savings. This comparison explores their strengths, weaknesses, and ideal use cases.
Supported Languages: strong multilingual support for over 29 languages
Multimodal: Text-only
Context Window: support up to 128K tokens and can generate up to 8K tokens
Architecture: Mixture of Experts (MoE) + Multi-Head Latent Attention
Training Data: Training on an extensive dataset of 18 trillion tokens
Training Method: according to different data to pretraining
DeepSeek V3 leverages multi-stage training with Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), enabling continuous optimization from human feedback (e.g., instruction following, safety alignment). Its MoE architecture dynamically adjusts expert weights, allowing a single model to flexibly adapt to multi-domain tasks (e.g., code generation, mathematical reasoning) without requiring task-specific retraining.
In contrast, Qwen 2.5 72B relies solely on pretraining and requires retraining specialized models for different domains (e.g., Qwen2.5-Coder for code and Qwen2.5-Math for mathematics). Although these specialized models achieve significant performance improvements through massive domain-specific data (e.g., 5.5T code tokens for Qwen2.5-Coder) and multi-modal reasoning methods (CoT, PoT, TIR), their generalization is limited by static data distributions, making them more suited for specialized tasks (e.g., programming evaluation, bilingual mathematical reasoning) rather than dynamic interactive scenarios.
Speed Comparison
If you want to test it yourself, you can start a free trial on the Novita AI website.
Qwen 2.5 72B surpasses DeepSeek V3 in output speed and latency. The input and output prices of DeepSeek V3 are significantly higher than those of Qwen 2.5 72B.
It is worth noting that Novita AI launches a Turbo version with 3x throughput and a limited-time 20% discount! Try it now!
Benchmark Comparison
Now that we’ve established the basic characteristics of each model, let’s delve into their performance across various benchmarks. This comparison will help illustrate their strengths in different areas.
Benchmark
DeepSeek V3 (%)
Qwen 2.5 72B (%)
LiveCodeBench (Coding)
36
28
GPQA Diamond
56
49
MATH-500
89
86
MMLU-Pro
76
72
These results suggest that DeepSeek V3’s machine-driven iterative reinforcement learning approach may be particularly effective for developing stronger capabilities in specialized technical domains requiring precise reasoning and structured problem-solving skills.
If you want to see more comparisons, you can check out these articles:
Massive domain-specific data (5.5T code tokens for coding models).
Supports diverse reasoning methods (CoT, PoT, TIR) for structured tasks.
Accessibility and Deployment through Novita AI
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.
Browse through the available options and select the model that suits your needs.
Step 3: Start Your Free Trial
Begin your free trial to explore the capabilities of the selected model.
Step 4: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.
Step 5: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="<YOUR Novita AI API Key>",
)
model = "deepseek/deepseek_v3"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Upon registration, Novita AI provides a $0.5 credit to get you started!
If the free credits is used up, you can pay to continue using it.
Choose DeepSeek V3 for technical precision and adaptability, or Qwen 2.5 72B for cost-effective multilingual tasks. For enterprises, DeepSeek Turbo’s throughput boost and Novita AI’s free trial make it a compelling option.
Frequently Asked Questions
Cost comparison of Qwen 2.5 72B and Deepseek V3?
Qwen costs $0.38/M input tokens vs. DeepSeek’s $0.89/M.
Why choose Qwen 2.5?
For multilingual support (29 languages) or tight budgets.
How to test Qwen 2.5 72B and Deepseek V3?
Try DeepSeek V3 Turbo on Novita AI with a 20% discount.
Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.