DeepSeek-V3-VS-Qwen-2-5-72B: Training Methods Explained

Key Highlights

✅ Training Methods:
DeepSeek V3: Pre-training → SFT → RL for dynamic adaptability.
Qwen 2.5: Domain-specific pretraining (e.g., code, math).
✅ Performance:
DeepSeek leads in coding (36% vs. 28%), math (89% vs. 86%), and reasoning benchmarks.
Qwen excels in multilingual tasks (29 languages vs. 3).
✅ Cost & Speed:
Qwen: Lower cost ($0.38/M input tokens) and faster output.
DeepSeek Turbo: 3× throughput + 20% discount for high-volume needs on Novita AI.

If you’re looking to evaluate the DeepSeek V3 and Qwen 2.5 72B on your own use-cases — Upon registration, Novita A I provides a $0.5 credit to get you started!

The battle between MoE (Mixture of Experts) language models intensifies with DeepSeek V3 (Dec 2024) and Qwen 2.5 72B (Sep 2024). While DeepSeek targets technical precision and dynamic interaction, Qwen prioritizes multilingual efficiency and cost savings. This comparison explores their strengths, weaknesses, and ideal use cases.

Table Of Contents

All Comparison: DeepSeek V3 vs. Qwen 2.5 72B
Basic Introduction of Model
Speed Comparison
Benchmark Comparison
Hardware Requiremments
Applications and Use Cases
Accessibility and Deployment through Novita AI

All Comparison: DeepSeek V3 vs. Qwen 2.5 72B

Category	DeepSeek V3	Qwen 2.5 72B
Release Date	Dec 27, 2024	Sep 19, 2024
Model Size	671B params (37B active/token, MoE)	72B params (MoE)
Training Method	Pre-training → SFT → RL	Domain-specific pretraining (e.g., code/math data)
Training Data	14.8T tokens	18T tokens
Key Benchmarks	– LiveCodeBench: 36% – GPQA: 56% – MATH-500: 89% – MMLU-Pro: 76%	– LiveCodeBench: 28% – GPQA: 49% – MATH-500: 86% – MMLU-Pro: 72%
Multilingual Support	✅ Chinese, English	✅ 29 languages
Cost ($/M Tokens)	Input: $0.89 Output: $0.89 Turbo: 3× throughput + 20% discount	Input: $0.38 Output: $0.40
Hardware Requirements	VRAM: 171.8GB GPU: 8~16GB (optimized for MoE)	VRAM: 145.5GB GPU: Minimum 32GB
Strengths	– High-precision reasoning – Dynamic task adaptation – High throughput	– Low cost – Multilingual coverage – Domain-specific optimizations
Best For	Technical R&D, real-time AI assistants, cloud-scale processing	Budget projects, static multilingual tasks, code/math specialized workflows

Best for you

Requirement	Recommended Choice
Coding/Math/QA tasks	✅ DeepSeek V3 (Higher accuracy)
Multilingual content	✅ Qwen 2.5 (29 languages + lower cost)
Real-time interaction	✅ DeepSeek V3 Turbo (RL-optimized)
Limited budget	✅ Qwen 2.5 (Cost-efficient)
GPU <32GB	✅ DeepSeek V3 (8~16GB support)

Basic Introduction of Model

To begin our comparison, we first understand the fundamental characteristics of each model.

DeepSeek V3

Release Date: December 27, 2024
Model Scale:
- deepseek/deepseek_v3
- deepseek/deepseek_v3 turbo with 3x throughput and a limited-time 20% discount!
Key Features:
- Model Size: 671B parameters (37B active/token)
- Tokenizer: SentencePiece-based multilingual tokenizer
- Supported Languages: Focused on Chinese, English
- Multimodal: Text-only
- Context Window: 128K tokens
- Storage Formats: FP8/BF16 inference
- Architecture: Mixture of Experts (MoE) + Multi-Head Latent Attention
- Training Data: 14.8T tokens for pre-training
- Training Method: Pre-training → Supervised Fine-Tuning (SFT) → Reinforcement Learning (RL)

Qwen 2.5 72B

Release Date: September 19, 2024 (Qwen 2.5 series)
Model Scale:
- qwen/qwen-2.5-72b-instruct
Key Features:
- Model Size: 72B parameters
- Supported Languages: strong multilingual support for over 29 languages
- Multimodal: Text-only
- Context Window: support up to 128K tokens and can generate up to 8K tokens
- Architecture: Mixture of Experts (MoE) + Multi-Head Latent Attention
- Training Data: Training on an extensive dataset of 18 trillion tokens
- Training Method: according to different data to pretraining

DeepSeek V3 leverages multi-stage training with Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), enabling continuous optimization from human feedback (e.g., instruction following, safety alignment). Its MoE architecture dynamically adjusts expert weights, allowing a single model to flexibly adapt to multi-domain tasks (e.g., code generation, mathematical reasoning) without requiring task-specific retraining.

In contrast, Qwen 2.5 72B relies solely on pretraining and requires retraining specialized models for different domains (e.g., Qwen2.5-Coder for code and Qwen2.5-Math for mathematics). Although these specialized models achieve significant performance improvements through massive domain-specific data (e.g., 5.5T code tokens for Qwen2.5-Coder) and multi-modal reasoning methods (CoT, PoT, TIR), their generalization is limited by static data distributions, making them more suited for specialized tasks (e.g., programming evaluation, bilingual mathematical reasoning) rather than dynamic interactive scenarios.

Speed Comparison

If you want to test it yourself, you can start a free trial on the Novita AI website.

Try cost effective but full size DeepSeek V3 Turbo Now!

Speed Comparison

OUTSPEED OF DEEPSEEK V3 AND QWEN 2.5 72B

latency OF DEEPSEEK V3 AND QWEN 2.5 72B — source from artificialanalysis

Cost Comparison on Novita AI

Model	Context	Input Price ($/M Tokens)	Output Price ($/M Tokens)
deepseek/deepseek-v3-turbo	64000	$0.4	$1.3
deepseek/deepseek_v3	64000	$0.89	$0.89
qwen/qwen-2.5-72b-instruct	32000	$0.38	$0.4

Qwen 2.5 72B surpasses DeepSeek V3 in output speed and latency. The input and output prices of DeepSeek V3 are significantly higher than those of Qwen 2.5 72B.

It is worth noting that Novita AI launches a Turbo version with 3x throughput and a limited-time 20% discount! Try it now!

Benchmark Comparison

Now that we’ve established the basic characteristics of each model, let’s delve into their performance across various benchmarks. This comparison will help illustrate their strengths in different areas.

Benchmark	DeepSeek V3 (%)	Qwen 2.5 72B (%)
LiveCodeBench (Coding)	36	28
GPQA Diamond	56	49
MATH-500	89	86
MMLU-Pro	76	72

These results suggest that DeepSeek V3’s machine-driven iterative reinforcement learning approach may be particularly effective for developing stronger capabilities in specialized technical domains requiring precise reasoning and structured problem-solving skills.

If you want to see more comparisons, you can check out these articles:

Hardware Requiremments

Model	VRAM	Recommend GPU
DeepSeek V3	171.8GB	8x RTX4090 or 4 x A100 or 2 x H100
Qwen 2.5 72B	145.5GB	8x RTX4090 or 4 x A100 or 2 x H100

Applications and Use Cases

DeepSeek V3

Use Cases:

High-Precision Technical Tasks: Code generation, mathematical reasoning, and complex QA (e.g., programming tools, R&D analytics).
Dynamic Interaction: Real-time AI assistants requiring instruction compliance and safety alignment (e.g., finance, legal advisory).
High Throughput: Turbo version suits large-scale batch processing (e.g., multilingual document handling, cloud services).

Strengths:

Superior performance in coding (LiveCodeBench: 36%), math (MATH-500: 89%), and reasoning (GPQA: 56%).
MoE architecture reduces active parameters (37B/671B), balancing efficiency and accuracy.

Qwen 2.5 72B

Use Cases:

Multilingual Static Tasks: Content generation/translation in 29 languages (e.g., global marketing, localized documentation).
Domain-Specific Workflows: Retrained specialized models (e.g., Qwen2.5-Coder for code evaluation, Qwen2.5-Math for bilingual problem-solving).
Budget-Friendly Projects: Lower cost ($0.38/M input tokens) for basic multilingual needs (e.g., startups, academic research).

Strengths:

Massive domain-specific data (5.5T code tokens for coding models).
Supports diverse reasoning methods (CoT, PoT, TIR) for structured tasks.

Accessibility and Deployment through Novita AI

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the Model Library

Try DeepSeek V3 Demo Now!

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 5: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="&lt;YOUR Novita AI API Key&gt;",
)

model = "deepseek/deepseek_v3"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Upon registration, Novita AI provides a $0.5 credit to get you started!

If the free credits is used up, you can pay to continue using it.

Choose DeepSeek V3 for technical precision and adaptability, or Qwen 2.5 72B for cost-effective multilingual tasks. For enterprises, DeepSeek Turbo’s throughput boost and Novita AI’s free trial make it a compelling option.

Frequently Asked Questions

Cost comparison of Qwen 2.5 72B and Deepseek V3?

Qwen costs $0.38/M input tokens vs. DeepSeek’s $0.89/M.

Why choose Qwen 2.5?

For multilingual support (29 languages) or tight budgets.

How to test Qwen 2.5 72B and Deepseek V3?

Try DeepSeek V3 Turbo on Novita AI with a 20% discount.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

DeepSeek V3 vs. Qwen 2.5 72B: Precision vs. Multilingual Efficiency

Key Highlights

All Comparison: DeepSeek V3 vs. Qwen 2.5 72B

Best for you

Basic Introduction of Model

DeepSeek V3

Qwen 2.5 72B

Speed Comparison

Speed Comparison

Cost Comparison on Novita AI

Benchmark Comparison

Hardware Requiremments

Applications and Use Cases

DeepSeek V3

Qwen 2.5 72B

Accessibility and Deployment through Novita AI

Step 1: Log In and Access the Model Library

Step 2: Choose Your Model

Step 3: Start Your Free Trial

Step 4: Get Your API Key

Step 5: Install the API

Frequently Asked Questions

Discover more from Novita

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Key Highlights

All Comparison: DeepSeek V3 vs. Qwen 2.5 72B

Best for you

Basic Introduction of Model

DeepSeek V3

Qwen 2.5 72B

Speed Comparison

Speed Comparison

Cost Comparison on Novita AI

Benchmark Comparison

Hardware Requiremments

Applications and Use Cases

DeepSeek V3

Qwen 2.5 72B

Accessibility and Deployment through Novita AI

Step 1: Log In and Access the Model Library

Step 2: Choose Your Model

Step 3: Start Your Free Trial

Step 4: Get Your API Key

Step 5: Install the API

Frequently Asked Questions

Recommend Reading

Discover more from Novita

Related Posts

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Discover more from Novita