English Arabic 简体中文 繁體中文 Français Deutsch 日本語 한국어 Português Русский Español
No other translations yet

Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide

Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide

DeepSeek models have emerged as a compelling choice in the LLM space, offering impressive performance at competitive costs. While these models present powerful capabilities, successful deployment requires a robust and efficient infrastructure solution. This guide demonstrates how to leverage Novita AI’s cloud platform for optimal DeepSeek model deployment, combining high performance with cost-effectiveness.

Model Variants Overview

Distilled Versions

  • Based on open-source models (Qwen2.5 and Llama series)
  • Parameter ranges: 1.5B, 7B, 8B, 14B, 32B, and 70B
  • Optimized for efficient inference while maintaining high performance
  • Ideal for cost-effective private deployments
  • Easily deployable through Novita AI’s one-click solution

Full-Scale Version

  • DeepSeek-R1-671B
  • Built upon DeepSeek-V3 architecture
  • Features 671B parameters for maximum performance
  • Requires significant computational resources
  • Available through our optimized API service

Deployment Guide

Step 1: Accessing Novita AI Platform

  1. Visit Novita AI’s official website:https://novita.ai/

novita ai website screenshot

[Try using Novita AI now](https://novita.ai/?utm_source=blogs_GPU&utm_medium=article&utm_campaign=Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide)

  1. Create an account or sign in to your existing account

novita ai website screenshot

novita ai website screenshot

Step 2: Accessing GPU Instance Configuration

  1. Click “GPUs” in the main navigation

novita ai website screenshot

2.Click “Get Started” to proceed

novita ai website screenshot

step3: Select and Configure DeepSeek Model

In this guide, we’ll use DeepSeek-R1-Distill-Llama-32B as an example. While you can select any template based on your needs, this template defines the model’s base parameters. You’ll need to configure the required number of GPUs - we recommend using RTX 4090 for this deployment. All templates use official DeepSeek models with a default BF16 precision. Below are our recommended configurations:

ModelGPUGPUQuantity
DeepSeek-R1-Distill-Qwen-1.5BBF16RTX 40901
DeepSeek-R1-Distill-Qwen-7BBF16RTX 40901
DeepSeek-R1-Distill-Llama-8BBF16RTX 40901
DeepSeek-R1-Distill-Qwen-14BBF16RTX 40902
DeepSeek-R1-Distill-Qwen-32BBF16RTX 40904
DeepSeek-R1-Distill-Llama-70BBF16RTX 40908

Select DeepSeek-R1-Distill-Qwen-32B template, set 4 GPUs, and click “Deploy”.

novita ai website screenshot

novita ai website screenshot

novita ai website screenshot

Step4: Customize Deployment

Confirm the template parameters and make sure to fill in the HF_TOKEN variable.

novita ai website screenshot

You can obtain the HF_TOKEN by following these tips:

1.Visit huggingface.co:https://huggingface.co/

2.Click “Log In” in the top right corner to sign in, or “Sign Up” to create a new account

3.After logging in, click your profile picture in the top right and select “Access Tokens” in the left menu

Steps to obtain tokens from Hugging Face

4.Click “New token” to create a new access token

Steps to obtain tokens from Hugging Face

5.Select “Read” for the token types, name your token (e.g., “text”), and click “Create token” to generate the token.

Steps to obtain tokens from Hugging Face

6.Copy the generated token string

Steps to obtain tokens from Hugging Face

After obtaining the token, enter it into the HF_TOKEN environment variable in the template. Then click “Next”.

Step5: Launch an instance

Click “Launch Instance” to deploy your configured environment.

Wait a few minutes while the instance is being configured and managed.

Novita ai gpu deploy screenshot

Click the dropdown menu to view the instance logs.

novita ai gpu deploy

After the instance starts, it will begin pulling the model. Click “Logs” —> “Instance Logs” to monitor the model download progress.

novita ai gpu deploy

When the log shows “INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)”, the startup is successful. Now let’s access your private model!

novita ai gpu deploy

Click “Connect”, then click —> “Connect to HTTP Service [Port 8000]”. Since this is an API service, you’ll need to copy the address.

novita ai gpu deploy

novita ai gpu deploy

novita ai gpu deploy

To make requests to your private model, please replace***“https://f6d29cb6f71e585e-8000.us-ca-1.gpu-instance.novita.ai”***with your actual exposed address. Copy the following code to access your private model!

$ curl https://f6d29cb6f71e585e-8000.us-ca-1.gpu-instance.novita.ai/v1/chat/completions  \
   -H "Content-Type: application/json"     -d '{
        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
        "messages": [{"role": "user", "content": "hello"}]
    }'
{"id":"chatcmpl-57b3296f87f54dd4b69cfb6d2196f48e","object":"chat.completion","created":1740711405,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"message":{"role":"assistant","content":"Alright, the user said \"hello.\" That's a friendly greeting. I should respond in a welcoming manner.\
\
Maybe I can acknowledge their greeting and offer assistance.\
\
It's important to sound approachable and ready to help.\
\
I'll keep it simple and polite.\
</think>\
\
Hello! How can I assist you today?","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":6,"total_tokens":70,"completion_tokens":64,"prompt_tokens_details":null},"prompt_logprobs":null}

novita ai gpu deploy

Configure the API address in your applications like Chatbox, and you’ll have your own personal assistant!

[Novita AI](https://novita.ai/?utm_source=blogs_GPU&utm_medium=article&utm_campaign=Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide) is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.