Since its release in May 2025, DeepSeek R1 0528 has become one of the most talked-about open-source models in the AI world. With 685 billion parameters and performance rivaling top proprietary models, it has impressed developers and researchers alike with its reasoning, coding, and math capabilities.
But as more people rush to try it, a key question keeps coming up:
How much does it actually cost to run this massive model? Read on.
Deepseek R1 0528 Model Card
DeepSeek R1 0528, released on May 28, 2025, is a powerful open-source AI model known for its advanced reasoning, exceptional performance, and cost-efficiency.
Key Features
- Size: 685 billion parameters (larger than OpenAI o3).
- Open Source: Fully open-source under the MIT license; weights available on Hugging Face.
- Architecture: Uses Mixture of Experts (MoE) for dynamic parameter activation, boosting efficiency.
- Language Support: Performs best in English and Chinese.
- Multimodal Capability: Text-only (no image/audio input support).
- Training Improvements: Enhanced reasoning and inference via optimized post-training methods.
Performance Highlights
- Reasoning and Programming:
- Strong in advanced math, logic, and programming tasks.
- Math Benchmarks:
- HMMT 2025: Pass@1 improved from 41.7% → 79.4%.
- AIME 2025: Pass@1 increased from 70.0% → 87.5%.
- Coding Benchmarks:
- Codeforces-Div1 Rating: 1530 → 1930.
- Aider-Polyglot Accuracy: 53.3% → 71.6%.
- LiveCodeBench Pass@1: 63.5% → 73.3%.
- Debugging and Code Generation:
- Self-corrects during code generation, reducing errors.
- Chain-of-Thought Reasoning:
- Provides step-by-step reasoning for accuracy and transparency.
- Tool Integration:
- Supports API integration with JSON output and function calling.
- Tau-Bench Pass@1 scores: Airline (53.5%), Retail (63.9%).
- Reduced Hallucinations:
- Improved reliability for critical use cases.
Deployment Options
- Full Model (685B):
- Requires 24 NVIDIA H100 GPUs (80GB each), 512GB–1TB RAM, and robust infrastructure.
- Distilled Version (Qwen3 8B):
- Runs on a single NVIDIA RTX 4090 GPU (24GB VRAM).
API Cost of Deepseek R1 0528
When to Use API Access?
Use the API when:
- You want zero setup or infrastructure maintenance
- You’re running batch inference or fine-tuning jobs
- You prefer on-demand, scalable workloads
- You value token-based pricing (input/output)
DeepSeek R1 0528 API Pricing Comparison
| Provider | Input ($/M) | Output ($/M) |
|---|---|---|
| Novita AI | 0.70 | 2.50 |
| Fireworks AI | 3.00 | 8.00 |
| Nebius AI Studio | 0.80 | 2.40 |
| Parasail | 0.79 | 4 |
✅ Novita AI offers the lowest API token cost. Ideal for cost-sensitive and scalable tasks like LLMOps, bulk inference, or non-interactive batch pipelines.
Usage Guide of API
To get started, simply use the code snippet below:
- Unified endpoint:
/v3/openaisupports OpenAI’s Chat Completions API format. - Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.
- Streaming & batching: Choose your preferred response mode.
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="session_Ntg-O34ZOS-q5bNnkb3IcixmWnmxEQBxwKWMW3es3CD7KG4PEhFE1yRTRMGS3s8zZ52hrMdz14MmI4oalaDJTw==",
)
model = "deepseek/deepseek-r1-0528"
stream = True # or False
max_tokens = 2048
system_content = ""Be a helpful assistant""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
You can Also Connect DeepSeek R1 0528 API on Third-Party Platforms
- Hugging Face: Use DeepSeek R1 0528 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.
- Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM, LangChain, Dify and Langflow through official connectors and step-by-step integration guides.
- OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.
GPU Cloud Cost of Deepseek R1 0528
When to Use GPU Instances?
Use cloud GPU if:
- You need full control over model execution
- You want to run custom fine-tuning
- You need longer sessions or persistent inference servers
- You’re using quantized models or accelerated frameworks
GPU Rental Pricing Comparison (per hour)
| Provider | GPU Type | Price/hr |
|---|---|---|
| Novita AI | A100 SXM | $1.60 |
| H100 SXM | $2.41 | |
| H200 SXM | $2.99 | |
| Lambda Cloud | H100 SXM | $3.29 |
| RunPod | A100 SXM | $1.74 |
| H100 SXM | $2.69 | |
| H200 | $3.99 | |
| Fireworks AI | H100 | $5.8 |
| H200 | $6.99 |
✅ For cost-efficiency, Novita AI is the best provider across all GPU types, while the A100 GPU is the most budget-friendly option for users.
Usage Guide of Cloud GPU
Step 1:Register an account
Create your Novita AI account through our website. After registration, navigate to the “Explore” section in the left sidebar to view our GPU offerings and begin your AI development journey.

Step 2:Exploring Templates and GPU Servers
Choose from templates like PyTorch, TensorFlow, or CUDA that match your project needs. Then select your preferred GPU configuration—options include the powerful L40S, RTX 4090 or A100 SXM4, each with different VRAM, RAM, and storage specifications.

Step 3:Tailor Your Deployment
Customize your environment by selecting your preferred operating system and configuration options to ensure optimal performance for your specific AI workloads and development needs.

Step 4:Launch an instance
Select “Launch Instance” to start your deployment. Your high-performance GPU environment will be ready within minutes, allowing you to immediately begin your machine learning, rendering, or computational projects.

Local Deployment Cost of Deepseek R1 0528
When to Deploy Locally?
Only consider on-premises deployment if:
- You need complete data control
- You already have datacenter-grade infrastructure
- You plan to run massive-scale, continuous inference
- You’re a research lab or enterprise with $MM budgets
Estimated Cost to Deploy Full DeepSeek R1 0528 Locally
| Component | Specs / Qty | Cost (USD) |
|---|---|---|
| NVIDIA A100 GPUs | 116 × A100 80GB | $2,577,251.96 |
| Server Nodes (Dual A100) | 58 × $50K | $2,900,000 |
| InfiniBand Networking | High-speed fabric | $100,000 |
| NVMe SSD Storage (100TB) | 4–6GB/s Read/Writes | $20,000 |
| Liquid Cooling + Rack | Enterprise-grade systems | $80,000 + $10,000 |
| Software & Licenses | Frameworks + OS | $10,000 |
| Power Infrastructure | UPS + Power Delivery | $50,000 |
| Electricity (Annual) | 700W per GPU | $50,000 |
| Maintenance & Support | Annual contracts | $100,000 |
| Total Estimate | $5.89M+ |
DeepSeek R1 0528 vs Other Models
DeepSeek R1 0528 vs Other Models: Price
| Model | Input Cost ($/M) | Output Cost ($/M) |
|---|---|---|
| DeepSeek R1 0528 | 0.70 | 2.50 |
| Gemini 2.5 Pro | 1.25–2.50 | 10–15 |
| OpenAI o3-pro | 20.00 | 80.00 |
DeepSeek R1‑0528 vs Other Models: Performance

DeepSeek R1 0528, with performance close to top-tier models, achieves up to 32 times price reduction, making it the most cost-effective choice in the current market.
Conclusion
Whether you’re building scalable AI pipelines, fine-tuning models, or deploying LLMs in production, DeepSeek R1 0528 on Novita AI offers the most cost-effective and flexible solution—without the infrastructure burden.
| Use Case | Best Choice | Why? |
|---|---|---|
| Batch Inference / Token Efficiency | Novita AI API | Cheapest input/output rates |
| Long-running / fine-tuning tasks | Novita AI GPU | Lowest hourly GPU rental |
| Private, secure, large-scale ops | On-Premise (if budget allows) | Full control, high complexity |
| Need high accuracy & cost control | DeepSeek R1 0528 | Beats Gemini/OpenAI in price |
Frequently Asked Questions
The estimated cost for building your own infrastructure is around $5.89M. However, using Novita AI’s cloud GPUs significantly reduces upfront costs, with H100 GPUs starting at $2.41/hour.
Prepare a clean, relevant dataset and use LoRA adapters or PEFT methods to efficiently fine-tune specific layers of the model. This ensures high performance without overfitting.
Yes, Novita AI supports deploying fine-tuned models as dedicated endpoints, with options for autoscaling, multi-LoRA setups, and API integration for seamless use in your applications.
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





