Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide

Table Of Contents

Model Variants Overview
Deployment Guide

DeepSeek models have emerged as a compelling choice in the LLM space, offering impressive performance at competitive costs. While these models present powerful capabilities, successful deployment requires a robust and efficient infrastructure solution. This guide demonstrates how to leverage Novita AI’s cloud platform for optimal DeepSeek model deployment, combining high performance with cost-effectiveness.

Model Variants Overview

Distilled Versions

Based on open-source models (Qwen2.5 and Llama series)
Parameter ranges: 1.5B, 7B, 8B, 14B, 32B, and 70B
Optimized for efficient inference while maintaining high performance
Ideal for cost-effective private deployments
Easily deployable through Novita AI’s one-click solution

Full-Scale Version

DeepSeek-R1-671B
Built upon DeepSeek-V3 architecture
Features 671B parameters for maximum performance
Requires significant computational resources
Available through our optimized API service

Deployment Guide

Step 1: Accessing Novita AI Platform

Visit Novita AI’s official website：https://novita.ai/

[Try using Novita AI now](https://novita.ai/?utm_source=blogs_GPU&utm_medium=article&utm_campaign=Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide)

Create an account or sign in to your existing account

Step 2: Accessing GPU Instance Configuration

Click “GPUs” in the main navigation

2.Click “Get Started” to proceed

step3: Select and Configure DeepSeek Model

In this guide, we’ll use DeepSeek-R1-Distill-Llama-32B as an example. While you can select any template based on your needs, this template defines the model’s base parameters. You’ll need to configure the required number of GPUs - we recommend using RTX 4090 for this deployment. All templates use official DeepSeek models with a default BF16 precision. Below are our recommended configurations:


Model	GPU	GPU	Quantity
DeepSeek-R1-Distill-Qwen-1.5B	BF16	RTX 4090	1
DeepSeek-R1-Distill-Qwen-7B	BF16	RTX 4090	1
DeepSeek-R1-Distill-Llama-8B	BF16	RTX 4090	1
DeepSeek-R1-Distill-Qwen-14B	BF16	RTX 4090	2
DeepSeek-R1-Distill-Qwen-32B	BF16	RTX 4090	4
DeepSeek-R1-Distill-Llama-70B	BF16	RTX 4090	8

Select DeepSeek-R1-Distill-Qwen-32B template, set 4 GPUs, and click “Deploy”.

Step4: Customize Deployment

Confirm the template parameters and make sure to fill in the HF_TOKEN variable.

You can obtain the HF_TOKEN by following these tips:

1.Visit huggingface.co:https://huggingface.co/

2.Click “Log In” in the top right corner to sign in, or “Sign Up” to create a new account

3.After logging in, click your profile picture in the top right and select “Access Tokens” in the left menu

4.Click “New token” to create a new access token

5.Select “Read” for the token types, name your token (e.g., “text”), and click “Create token” to generate the token.

6.Copy the generated token string

After obtaining the token, enter it into the HF_TOKEN environment variable in the template. Then click “Next”.

Step5: Launch an instance

Click “Launch Instance” to deploy your configured environment.

Wait a few minutes while the instance is being configured and managed.

Click the dropdown menu to view the instance logs.

After the instance starts, it will begin pulling the model. Click “Logs” —> “Instance Logs” to monitor the model download progress.

When the log shows “INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)”, the startup is successful. Now let’s access your private model!

Click “Connect”, then click —> “Connect to HTTP Service [Port 8000]”. Since this is an API service, you’ll need to copy the address.

To make requests to your private model, please replace***“https://f6d29cb6f71e585e-8000.us-ca-1.gpu-instance.novita.ai”***with your actual exposed address. Copy the following code to access your private model!

$ curl https://f6d29cb6f71e585e-8000.us-ca-1.gpu-instance.novita.ai/v1/chat/completions  \
   -H "Content-Type: application/json"     -d '{
        "model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
        "messages": [{"role": "user", "content": "hello"}]
    }'
{"id":"chatcmpl-57b3296f87f54dd4b69cfb6d2196f48e","object":"chat.completion","created":1740711405,"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-32B","choices":[{"index":0,"message":{"role":"assistant","content":"Alright, the user said \"hello.\" That's a friendly greeting. I should respond in a welcoming manner.\
\
Maybe I can acknowledge their greeting and offer assistance.\
\
It's important to sound approachable and ready to help.\
\
I'll keep it simple and polite.\
</think>\
\
Hello! How can I assist you today?","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":6,"total_tokens":70,"completion_tokens":64,"prompt_tokens_details":null},"prompt_logprobs":null}

Configure the API address in your applications like Chatbox, and you’ll have your own personal assistant!

[Novita AI](https://novita.ai/?utm_source=blogs_GPU&utm_medium=article&utm_campaign=Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide) is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Deploying DeepSeek Models on Novita AI Cloud Platform: A Comprehensive Guide

Model Variants Overview

Distilled Versions

Full-Scale Version

Deployment Guide

Step 1: Accessing Novita AI Platform

Step 2: Accessing GPU Instance Configuration

step3: Select and Configure DeepSeek Model

Step4: Customize Deployment

Step5: Launch an instance

Product

RESOURCES

Partners

Company

Model Variants Overview

Distilled Versions

Full-Scale Version

Deployment Guide

Step 1: Accessing Novita AI Platform

Step 2: Accessing GPU Instance Configuration

step3: Select and Configure DeepSeek Model

Step4: Customize Deployment

Step5: Launch an instance

Related Posts

Product

RESOURCES

Partners

Company