A Guide to Accessing DeepSeek V3: Locally and via API

how to access deepseek v3

Key Highlights

DeepSeek V3 introduces innovative architecture features like MoE and MLA, significantly enhancing efficiency and context length.

Deepseek V3’s cost-effectiveness is remarkable, with low training costs and API usage fees cheaper than competitors.

Users can access DeepSeek V3 through Novita AI‘s API or deploy it locally, offering flexibility for various needs and resources.

DeepSeek, an innovative AI model, catapulted to global fame in late January 2025. Following the release of its V3 model and app, the open-sourcing of its R1 inference model on January 20 sparked worldwide interest. Within days, DeepSeek’s app soared to the top of the U.S. App Store, outpacing tech giants. So, in this article, we’ll examine how to access it, compare local deployment with API access, and offer guidance for different user needs.

What is DeepSeek-v3

DeepSeek V3 is a cutting-edge open-source Mixture-of-Experts (MoE) large language model developed by Beijing DeepSeek Technology Co., Ltd. This advanced model boasts 671 billion parameters, with only 37 billion activated per token during inference, optimizing performance while minimizing resource consumption. DeepSeek V3 is designed to compete with leading models like GPT, particularly excelling in coding and technical tasks.

Key Features

whati s moe
  • Mixture-of-Experts (MoE) Architecture: DeepSeek V3 employs an MoE framework with fine-grained dynamic load-balancing techniques, eliminating the need for auxiliary loss.
  • Multi-Head Latent Attention (MLA): This feature enhances inference efficiency by compressing attention keys and values, reducing memory overhead and enabling the model to handle long context windows of up to 128K tokens.
  • Multi-Token Prediction (MTP): MTP allows DeepSeek V3 to predict multiple tokens simultaneously, improving both training efficiency and inference speed.
  • FP8 Mixed Precision Training: The model utilizes 8-bit floating-point precision for training, reducing memory and computational costs.
  • Bilingual Support: DeepSeek V3 is optimized for both English and Chinese, making it suitable for multilingual applications in these languages.

As illustrated in the chart, DeepSeek-V3 achieves an optimal balance between high performance and low cost through its innovative architectural design, making it a benchmark for performance-to-price ratio. This design allows DeepSeek-V3 to stand out among numerous models, making it a versatile choice for various budgets and task requirements.

  • Large-scale inference tasks (e.g., batch content generation).
  • Enterprises and small-to-medium-sized teams with a strong focus on cost-efficiency.
  • Tasks involving mathematics, code generation, and complex logical reasoning.

DeepSeek-V3 is now live on Novita AI! Enjoy incredibly low pricing per million tokens for both input and output—don’t miss this opportunity to access cutting-edge AI at an unbeatable cost!

Compared with Other Models

DeepSeek-V3 is a powerful model excelling in multiple domains, particularly in handling professional knowledge, basic math, and programming tasks. However, it has room for improvement in advanced reasoning and specific domain applications. This indicates areas for future enhancements, such as boosting open-ended problem-solving, complex mathematical reasoning, and performance in practical software engineering scenarios.

If you want to see a more detailed parameter comparison, you can check out the articles: Deepseek v3 vs Llama 3.3 70b: Language Tasks vs Code & Math; Llama 3.2 3B vs DeepSeek V3: Comparing Efficiency and Performance.

How to Access DeepSeek-v3 Locally

Hardware Requirements and Configuration Recommendations

  • Operating System
    • Windows 10 or newer
    • macOS 10.15 or later
    • Linux (Ubuntu 18.04+)
  • CPU
    • Multi-core processor (minimum 4 cores)
  • GPU
    • NVIDIA GPUs recommended for faster inference
    • Minimum 8GB VRAM for smaller R1 distills
    • More VRAM required for the full 671B model
    • CPU-only runs possible but significantly slower
  • Memory (RAM)
    • 8GB: Sufficient for smallest versions (1.5B or 7B)
    • 16GB or more: Recommended for mid-range models (14B or 32B)
  • Storage
    • 4–50GB free space required, depending on R1 size downloaded
  • Software Requirements
    • Python 3.10 for official R1 scripts

Step-by-Step Installation Guide

1.Clone the Repository:

git clone https://github.com/deepseek-ai/DeepSeek-V3.git

2.Navigate to the Inference Folder and Install Dependencies:

cd DeepSeek-V3/inference
pip install -r requirements.txt

3.Download Model Weights:
Download the model weights from Hugging Face and place them in the designated directory (e.g., /path/to/DeepSeek-V3).

4.Convert Model Weights:
Use the provided convert.py script to convert the weights to a specific format. For example:

python convert.py --hf-ckpt-path /path/to/DeepSeek-V3 --save-path /path/to/DeepSeek-V3-Demo --n-experts 256 --model-parallel 16

5.Run DeepSeek-V3:
Use the torchrun command to start the model. Modify the parameters as needed for your setup. Example:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200

6.Batch Inference (Optional):
For batch inference on a given file:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --input-file $FILE

How to Access DeepSeek-v3 via Novita AI

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

choose models

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

free trail

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 5: Install the API

Install API using the package manager specific to your programming language.

install api

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "deepseek/deepseek_v3"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

Upon registration, Novita AI provides a $0.5 credit to get you started!

If the free credits is used up, you can pay to continue using it.

Which Methods Are Suitable for You?

Comparison of Local vs. API Access

Feature Local Deployment API Access
Control High Limited
Customization Flexible Restricted
Hardware Requirements High Low
Initial Cost High Low
Scalability Limited High
Maintenance Difficulty High Low
Privacy Protection Strong Depends on the provider

Recommendations for Different User Groups

  • For Researchers and Developers
    • Recommendation: Local deployment of DeepSeek V3.
    • Why: Offers full control over the model, allowing for extensive customization and optimization.
    • Considerations: Requires substantial hardware resources and advanced technical expertise.
  • For Startups and Small-to-Medium Businesses
    • Recommendation: Use the DeepSeek V3 API provided by Novita AI.
    • Advantages: Cost-effective, easy to integrate, and scalable to meet evolving business needs.
    • Best Use Cases: Quickly prototyping ideas and building AI-driven applications without heavy upfront investments.

In conclusion, DeepSeek V3 is a powerful open-source model that delivers outstanding performance in coding, math, and reasoning tasks. It offers flexibility for diverse use cases, whether deployed locally or accessed via APIs. While local deployment provides full control, it demands substantial hardware resources. Alternatively, platforms like Novita AI offer a more accessible and convenient way to utilize the model’s capabilities. The optimal choice depends on your project’s requirements, technical expertise, and budget.

Frequently Asked Questions

How does DeepSeek V3 achieve its efficiency?


DeepSeek V3 uses a Mixture-of-Experts (MoE) architecture, activating only 37 billion parameters per token. It employs Multi-Head Latent Attention (MLA) and a Multi-Token Prediction (MTP) objective to reduce resource consumption and accelerate both training and inference.

What are the key advantages of DeepSeek V3?

DeepSeek V3 excels in coding, math, reasoning, and general knowledge tasks and has strong multilingual support for English and Chinese. It demonstrates exceptional performance across various benchmarks, often surpassing other open-source and even closed-source models.

What are the VRAM requirements for DeepSeek V3?


The VRAM requirements for DeepSeek V3 vary based on precision. For FP16, the 671B model requires approximately 1,543 GB of VRAM, while with 4-bit quantization, it requires approximately 386 GB of VRAM. The active parameters are 37B.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Recommend Reading


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading