Image API

How to Run VLLM on Windows Docker: Simple Guide

novita.ai

Jul 30, 2024 • 4 min read

Master the deployment of vLLM on Windows Docker for improved efficiency and performance. Get expert insights on our blog today.

Key Highlights

In the AI field, Large Language Models (LLMs) play a vital role in various applications, such as natural language processing and text generation.
Trusted platforms like vLLM offer LLMs as a service — under their generally well-regarded security and privacy policies.
VLLM is a powerful distributed inference library for handling large-scale models.
Docker provides an efficient way to containerize applications, making it easy to run vLLM on Windows.
With a guide simplifying the process of running VLLM on Windows Docker, new developers can master Docker and machine learning.

Introduction

In the era of data science and machine learning, LLMs are vast in size and complexity, demanding more meticulous attention to deploy effectively. vLLM, short for Virtual Large Language Models has become crucial for advanced NLP applications. Whether you’re a data scientist, developer, or researcher, running VLLMs efficiently can make a significant difference in your projects. This blog provides a step-by-step process for setting up and running VLLM on Windows using Docker. We’ll cover everything from prerequisites to troubleshooting tips to ensure a smooth setup.

Exploring VLLM and Docker

Basics of VLLM

Before diving into Docker specifics, let’s briefly cover what VLLM is. Virtual Large Language Models (vLLM) is a high-performance, open-source inference server for large language models equipped with PagedAttention. It is created for ease of use and high throughput with algorithms. vLLM is up to 24 times faster than similar solutions offered by other inference servers. They play a crucial role in numerous NLP tasks. Running these models efficiently necessitates strong computational resources and a properly configured environment, where Docker proves to be useful.

Advantages of VLLM

Easy integration with popular models
High throughput by serving more requests per second than traditional methods
Near-zero waste in cache memory, with faster query response times
OpenAI-compatible API server

Why Use Docker?

Docker is an open-source container service platform for developing, shipping, deploying, and running containerized applications. Docker simplifies the configuration and control of software environments through containerization. These containers bundle an application with its requirements, enabling it to operate uniformly on various computing setups. vLLM benefits by avoiding setup complications and version discrepancies, making model deployment and administration easier.

How to Run VLLM on Windows Docker

Here we will take Llama3.1 70B for example to show how to run VLLM on Windows Docker. Novita AI provides LLM API service for this model too. You can visit Model API to see our featured models.

Prerequisites for Running VLLM on Windows Docker

Windows 10 or later: Docker Desktop for Windows is compatible with these versions.
Docker Desktop: Install Docker Desktop from the official Docker website.

Step-by-Step Guide to Running VLLM on Windows Docker

Step 1: Install Docker Desktop

Download Docker Desktop: Visit the Docker website and download it for Windows.
Install Docker: Run the installer and follow the on-screen instructions. Enable virtualization if prompted.

Step 2: Configure Docker for Windows

Start Docker Desktop: Launch Docker Desktop from your Start menu. Keep it in the right directory.
Adjust Resources: Go to Docker Settings > Resources and allocate at least 4 CPUs and 8GB RAM for VLLM.
Clone the VLLM repository:

git clone https://github.com/vllm-project/vllm.git
cd vllm

Step 3: Create Dockerfile for VLLM

Create Dockerfile: In the vLLM directory, create a Dockerfile to set up the environment for VLLM and LLaMA 3.1 70B.

Tips for Running VLLM on Windows Docker

Check Docker Settings: Ensure Docker Desktop is correctly installed and running. Verify that Docker is configured to use Linux containers.
Image and Dependencies: Ensure the vLLM Docker image is correctly downloaded. You can check with docker images. If there are issues with the image, try rebuilding it: docker build -t vllm.
Custom Models: Modify the Dockerfile and requirements.txt include additional libraries or custom VLLM models.
Volume Mounting: Use Docker volumes to persist data and manage large datasets efficiently.

Since it’s hard to do the vLLM deploying steps above, you can find the packed image on DockerHub and upload it to the Template of the Novita AI Instance. Then you can deploy vLLM simply.

Conclusion

Running vLLM on Windows using Docker offers a reliable environment for NLP model development and deployment. This guide helps set up a containerized environment for simplified dependency management and deployment, minimizing software conflicts and versioning issues. For support, check Docker’s official documentation and the vLLM community forums. Integrating Docker with vLLM streamlines your workflow and ensures efficient model performance across platforms.

FAQs

Does vLLM run locally?

VLLM will download the model automatically and store it in your HuggingFace cache directory. If you are running vLLM locally, there will be the default IP address and port.

Does vLLM require CUDA?

CUDA 11.8 or higher is required for GPUs with compute capability 9.0.

Can Docker run directly on Windows?

Docker containers allow you to run Windows programs and executables. The Docker platform is compatible with Windows (x86–64) operating systems.

How can I tell if the Docker daemon is running on Windows?

To check if the Docker daemon is running on Windows, look for the Docker Desktop icon in the system tray or run “docker info” in a PowerShell/Command Prompt window to display Docker environment information if the daemon is active.

Is Docker for Windows free?

Docker Desktop is free for small businesses (with fewer than 250 employees AND less than $10 million in annual revenue), personal use, education, and non-commercial open-source projects. For professional use beyond these categories, a paid subscription is necessary.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Recommended Reading

1.What is vLLM: Unveiling the Mystery

2.Mastering vLLM Mixtral: Expert Tips for Success

3.Unveiling VLLM List Models: A Comprehensive Guide