DeepSeek V3 vs. Qwen 2.5 72B: Precision vs. Multilingual Efficiency

DEEPSEEK V3 VS QWEN 2.5 72B

Key Highlights

Training Methods:
DeepSeek V3: Pre-training → SFT → RL for dynamic adaptability.
Qwen 2.5: Domain-specific pretraining (e.g., code, math).
Performance:
DeepSeek leads in coding (36% vs. 28%), math (89% vs. 86%), and reasoning benchmarks.
Qwen excels in multilingual tasks (29 languages vs. 3).
Cost & Speed:
Qwen: Lower cost ($0.38/M input tokens) and faster output.
DeepSeek Turbo: 3× throughput + 20% discount for high-volume needs on Novita AI.

If you’re looking to evaluate the DeepSeek V3 and Qwen 2.5 72B on your own use-cases — Upon registration, Novita AI provides a $0.5 credit to get you started!

The battle between MoE (Mixture of Experts) language models intensifies with DeepSeek V3 (Dec 2024) and Qwen 2.5 72B (Sep 2024). While DeepSeek targets technical precision and dynamic interaction, Qwen prioritizes multilingual efficiency and cost savings. This comparison explores their strengths, weaknesses, and ideal use cases.

All Comparison: DeepSeek V3 vs. Qwen 2.5 72B

CategoryDeepSeek V3Qwen 2.5 72B
Release DateDec 27, 2024Sep 19, 2024
Model Size671B params (37B active/token, MoE)72B params (MoE)
Training MethodPre-training → SFT → RLDomain-specific pretraining (e.g., code/math data)
Training Data14.8T tokens18T tokens
Key Benchmarks– LiveCodeBench: 36%
– GPQA: 56%
– MATH-500: 89%
– MMLU-Pro: 76%
– LiveCodeBench: 28%
– GPQA: 49%
– MATH-500: 86%
– MMLU-Pro: 72%
Multilingual Support✅ Chinese, English✅ 29 languages
Cost ($/M Tokens)Input: $0.89
Output: $0.89
Turbo: 3× throughput + 20% discount
Input: $0.38
Output: $0.40
Hardware RequirementsVRAM: 171.8GB
GPU: 8~16GB (optimized for MoE)
VRAM: 145.5GB
GPU: Minimum 32GB
Strengths– High-precision reasoning
– Dynamic task adaptation
– High throughput
– Low cost
– Multilingual coverage
– Domain-specific optimizations
Best ForTechnical R&D, real-time AI assistants, cloud-scale processingBudget projects, static multilingual tasks, code/math specialized workflows

Best for you

RequirementRecommended Choice
Coding/Math/QA tasks✅ DeepSeek V3 (Higher accuracy)
Multilingual content✅ Qwen 2.5 (29 languages + lower cost)
Real-time interaction✅ DeepSeek V3 Turbo (RL-optimized)
Limited budget✅ Qwen 2.5 (Cost-efficient)
GPU <32GB✅ DeepSeek V3 (8~16GB support)

Basic Introduction of Model

To begin our comparison, we first understand the fundamental characteristics of each model.

DeepSeek V3

  • Release Date: December 27, 2024
  • Model Scale:
  • Key Features:
    • Model Size: 671B parameters (37B active/token)
    • Tokenizer: SentencePiece-based multilingual tokenizer
    • Supported Languages: Focused on Chinese, English
    • Multimodal: Text-only
    • Context Window: 128K tokens
    • Storage Formats: FP8/BF16 inference
    • Architecture: Mixture of Experts (MoE) + Multi-Head Latent Attention
    • Training Data: 14.8T tokens for pre-training
    • Training Method: Pre-training → Supervised Fine-Tuning (SFT) → Reinforcement Learning (RL)

Qwen 2.5 72B

  • Release Date: September 19, 2024 (Qwen 2.5 series)
  • Model Scale:
  • Key Features:
    • Model Size: 72B parameters
    • Supported Languages: strong multilingual support for over 29 languages
    • Multimodal: Text-only
    • Context Window: support up to 128K tokens and can generate up to 8K tokens
    • Architecture: Mixture of Experts (MoE) + Multi-Head Latent Attention
    • Training Data: Training on an extensive dataset of 18 trillion tokens
    • Training Method: according to different data to pretraining

DeepSeek V3 leverages multi-stage training with Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), enabling continuous optimization from human feedback (e.g., instruction following, safety alignment). Its MoE architecture dynamically adjusts expert weights, allowing a single model to flexibly adapt to multi-domain tasks (e.g., code generation, mathematical reasoning) without requiring task-specific retraining.

In contrast, Qwen 2.5 72B relies solely on pretraining and requires retraining specialized models for different domains (e.g., Qwen2.5-Coder for code and Qwen2.5-Math for mathematics). Although these specialized models achieve significant performance improvements through massive domain-specific data (e.g., 5.5T code tokens for Qwen2.5-Coder) and multi-modal reasoning methods (CoT, PoT, TIR), their generalization is limited by static data distributions, making them more suited for specialized tasks (e.g., programming evaluation, bilingual mathematical reasoning) rather than dynamic interactive scenarios.

Speed Comparison

If you want to test it yourself, you can start a free trial on the Novita AI website.

STRAT A FREE TRAIL

Speed Comparison

OUTSPEED OF DEEPSEEK V3 AND QWEN 2.5 72B
latency OF DEEPSEEK V3 AND QWEN 2.5 72B
source from artificialanalysis

Cost Comparison on Novita AI

ModelContextInput Price ($/M Tokens)Output Price ($/M Tokens)
deepseek/deepseek-v3-turbo64000$0.4$1.3
deepseek/deepseek_v364000$0.89$0.89
qwen/qwen-2.5-72b-instruct32000$0.38$0.4

Qwen 2.5 72B surpasses DeepSeek V3 in output speed and latency. The input and output prices of DeepSeek V3 are significantly higher than those of Qwen 2.5 72B.

It is worth noting that Novita AI launches a Turbo version with 3x throughput and a limited-time 20% discount! Try it now!

deepseek r1 turbo price

Benchmark Comparison

Now that we’ve established the basic characteristics of each model, let’s delve into their performance across various benchmarks. This comparison will help illustrate their strengths in different areas.

Benchmark DeepSeek V3 (%) Qwen 2.5 72B (%)
LiveCodeBench (Coding) 36 28
GPQA Diamond 56 49
MATH-500 89 86
MMLU-Pro 76 72

These results suggest that DeepSeek V3’s machine-driven iterative reinforcement learning approach may be particularly effective for developing stronger capabilities in specialized technical domains requiring precise reasoning and structured problem-solving skills.

If you want to see more comparisons, you can check out these articles:

Hardware Requiremments

ModelVRAMRecommend GPU
DeepSeek V3171.8GB8x RTX4090 or 4 x A100 or 2 x H100
Qwen 2.5 72B145.5GB8x RTX4090 or 4 x A100 or 2 x H100

Applications and Use Cases

DeepSeek V3

Use Cases:

  1. High-Precision Technical Tasks: Code generation, mathematical reasoning, and complex QA (e.g., programming tools, R&D analytics).
  2. Dynamic Interaction: Real-time AI assistants requiring instruction compliance and safety alignment (e.g., finance, legal advisory).
  3. High Throughput: Turbo version suits large-scale batch processing (e.g., multilingual document handling, cloud services).

Strengths:

  • Superior performance in coding (LiveCodeBench: 36%), math (MATH-500: 89%), and reasoning (GPQA: 56%).
  • MoE architecture reduces active parameters (37B/671B), balancing efficiency and accuracy.

Qwen 2.5 72B

Use Cases:

  1. Multilingual Static Tasks: Content generation/translation in 29 languages (e.g., global marketing, localized documentation).
  2. Domain-Specific Workflows: Retrained specialized models (e.g., Qwen2.5-Coder for code evaluation, Qwen2.5-Math for bilingual problem-solving).
  3. Budget-Friendly Projects: Lower cost ($0.38/M input tokens) for basic multilingual needs (e.g., startups, academic research).

Strengths:

  • Massive domain-specific data (5.5T code tokens for coding models).
  • Supports diverse reasoning methods (CoT, PoT, TIR) for structured tasks.

Accessibility and Deployment through Novita AI

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

choose models

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

start a free trail

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 5: Install the API

Install API using the package manager specific to your programming language.

install api

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="&lt;YOUR Novita AI API Key&gt;",
)

model = "deepseek/deepseek_v3"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

Upon registration, Novita AI provides a $0.5 credit to get you started!

If the free credits is used up, you can pay to continue using it.

Choose DeepSeek V3 for technical precision and adaptability, or Qwen 2.5 72B for cost-effective multilingual tasks. For enterprises, DeepSeek Turbo’s throughput boost and Novita AI’s free trial make it a compelling option.

Frequently Asked Questions

Cost comparison of Qwen 2.5 72B and Deepseek V3?

Qwen costs $0.38/M input tokens vs. DeepSeek’s $0.89/M.

Why choose Qwen 2.5?

For multilingual support (29 languages) or tight budgets.

How to test Qwen 2.5 72B and Deepseek V3?

Try DeepSeek V3 Turbo on Novita AI with a 20% discount.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Recommend Reading


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading