ComfyUI + WAN 2.1 : The Complete Setup and Usage Guide 2025

The world of AI video generation has taken a significant leap forward with the integration of WAN2.1 into ComfyUI. This powerful combination offers creators and developers new possibilities in video generation, from text-to-video to image-to-video conversions. This guide will walk you through everything you need to know about setting up and using these tools effectively.

Table Of Contents

Understanding WAN 2.1 Video Models
Understanding ComfyUI
System Requirements and Hardware Considerations
Installation and Setup
Novita AI – Your First Choice for Cloud Deployment of WAN and ComfyUI
Conclusion

Understanding WAN 2.1 Video Models

WAN 2.1 represents the latest generation of AI-powered video models, specifically designed to meet the diverse needs of video creators. It leverages advanced neural networks to produce high-quality, realistic video outputs from prompts or predefined content. The model is built to handle a variety of video formats, offering flexibility in video length, resolution, and style.

Key features of WAN 2.1 include:

High-fidelity video generation: Delivering stunning detail and realism in each frame.
Customization options: Allowing creators to adjust various parameters to fine-tune video content.
Efficiency and speed: WAN 2.1 significantly reduces the time required to generate long or complex videos.

The model has become popular for applications in marketing, film production, social media content creation, and educational videos.

Understanding ComfyUI

ComfyUI is a versatile interface that simplifies the process of working with AI models like WAN 2.1. Its intuitive design allows users to configure complex video generation processes without needing extensive coding knowledge. ComfyUI’s primary focus is on providing a clean and efficient user experience while offering full control over the video generation workflow.

User-friendly design: A simple, clean interface that caters to both beginners and experienced users.
Seamless integration: Works smoothly with models like WAN 2.1, providing powerful tools to manage video generation tasks.
Customization and flexibility: Offers various settings for controlling output quality, video length, and style, giving users complete creative control.

System Requirements and Hardware Considerations

Before setting up WAN 2.1 and ComfyUI, it’s essential to ensure your system meets the necessary hardware and software requirements. Running WAN 2.1 for video generation is a resource-intensive process, so having the right setup is critical to avoid lag or rendering issues.

GPU Requirements

A robust Graphics Processing Unit (GPU) is essential for handling the computational load of WAN 2.1 and other machine learning models. Ideally, your system should be equipped with a modern NVIDIA GPU that supports CUDA and Tensor cores, as these features significantly improve performance during deep learning tasks. Popular options include:

NVIDIA RTX 3080, 3090, or RTX 4090: These GPUs offer exceptional performance for video generation tasks, providing the power needed to run WAN 2.1 smoothly.
NVIDIA H100 or A100: For users looking for even more power, these data center GPUs are perfect for high-demand video generation tasks, although they come with a higher price tag.

VRAM and Performance

The performance of WAN2.1 models is heavily influenced by available VRAM and GPU capabilities:

Minimum VRAM Requirements:
- Models with higher resolutions (e.g., 720P), recommended 24GB or more VRAM for optimal performance.
- For lower-resolution outputs, such as 480P, 8–12GB VRAM may suffice, depending on the model used.
Performance Metrics:
- On a high-end GPU like the RTX 4090, generating a 5-second 480P video using the WAN 2.1 Text-to-Video 1.3B model can take approximately 4 minutes.
- For GPUs with lower VRAM (e.g., RTX 3060), expect slower processing times and potential limitations with higher-resolution models.

Recommended setup for best performance

GPU: NVIDIA RTX 4090 or NVIDIA A100, both of which offer superior performance for large video models.
RAM: 64 GB+ for handling high-resolution videos and complex projects.
Storage: 1 TB SSD for faster data access and to store large video files.

Installation and Setup

Step1: Install/Update ComfyUI

Option 1: Update Existing ComfyUI

If you already have ComfyUI installed, run in ComfyUI directory:

git pull origin master

Option 2: Fresh Installation

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python -m pip install torch torchvision torchaudio
python -m pip install -r requirements.txt

Step2: Download Required Model Files

Download the following 4 files and place them in the specified directories:

Choose a diffusion model, place in: ComfyUI/models/diffusion_models/
Text encoder model, place in: ComfyUI/models/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors
CLIP vision model, place in: ComfyUI/models/clip_vision/clip_vision_h.safetensors
VAE model, place in: ComfyUI/models/vae/wan_2.1_vae.safetensors

Step3: Launch ComfyUI

python main.py

Step4: Getting Started

Visit http://localhost:8188 and load example workflows.

Novita AI – Your First Choice for Cloud Deployment of WAN and ComfyUI

Novita AI offers a robust cloud platform for deploying AI applications, including the integration of WAN 2.1 models with ComfyUI. This setup allows users to leverage high-performance GPUs without the need for local hardware investments, making it an ideal choice for creators and developers looking to scale their AI video generation capabilities efficiently.

Step1：Create an account

Visit the Novita AI website. Once registered, navigate to the “GPUs” tab to browse available resources and begin your AI journey.

Try using Novita AI now

Step2：Select Your GPU

We offer a variety of pre-designed templates crafted to meet your specific needs, while also giving you the flexibility to create custom templates from scratch. Powered by high-performance GPUs such as the NVIDIA RTX H100—with ample VRAM and RAM—our platform ensures smooth and efficient training of even the most complex AI models.

Try Novita AI’s High-Performance GPUs

Step3：Customize Your Setup

Flexible storage solutions tailored to your needs. Our platform includes 60GB of free Container Disk storage . Need more space? Additional storage can be easily purchased to scale with your growing requirements.

Step4：Launch Your Instance

Select “On Demand”, review your instance configuration and pricing details. When ready, click “Deploy” to launch your GPU instance.

Announcing the launch of Novita GPU Instance Subscription Plans!

Key Features:

Flexible Billing Options: Choose between pay-as-you-go or monthly subscription when creating your instance
Enhanced Resource Guarantee: During your subscription period, your instance resources remain reserved even when powered off, significantly improving user experience
Seamless Service Conversion: Easily convert from pay-as-you-go to subscription model, with option to renew during subscription period
Subscription Discounts: Monthly subscriptions offer at least 10% savings compared to pay-as-you-go rates, with greater discounts for longer commitment periods

Conclusion

The combination of WAN 2.1 and ComfyUI offers a powerful toolset for AI video generation, providing high-quality output, hardware efficiency, and creative flexibility. Whether you’re a professional or an individual creator, this setup allows you to produce professional-grade videos with ease, pushing the boundaries of what’s possible in AI-driven video creation.

Frequently Asked Questions

Can I run WAN 2.1 and ComfyUI on my personal computer?

While possible, we recommend using cloud GPU services like Novita AI for optimal performance. WAN 2.1 requires significant GPU resources, typically a minimum of 12GB VRAM for basic operations.

Do I need coding experience to use ComfyUI with WAN 2.1?

No coding experience is required. ComfyUI provides a visual node-based interface that allows you to create workflows through drag-and-drop operations.

How much VRAM do I need for optimal performance?

For best performance, we recommend 16GB+ VRAM. However, you can run with 12GB VRAM using optimization techniques, though this may limit some features.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Recommended Reading

Choosing the Right GPU for Your Wan 2.1

Wan2.1 vs HunyuanVideo: Architecture, Efficiency, and Quality

Wan2.1 vs Sora: Open-Source vs Advanced Editing Features

Discover more from Novita

Subscribe to get the latest posts sent to your email.

ComfyUI + WAN 2.1 : The Complete Setup and Usage Guide 2025

Understanding WAN 2.1 Video Models

Understanding ComfyUI

System Requirements and Hardware Considerations

GPU Requirements

VRAM and Performance

Recommended setup for best performance

Installation and Setup

Novita AI – Your First Choice for Cloud Deployment of WAN and ComfyUI

Conclusion

Frequently Asked Questions

Discover more from Novita

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Understanding WAN 2.1 Video Models

Understanding ComfyUI

System Requirements and Hardware Considerations

GPU Requirements

VRAM and Performance

Recommended setup for best performance

Installation and Setup

Novita AI – Your First Choice for Cloud Deployment of WAN and ComfyUI

Conclusion

Frequently Asked Questions

Discover more from Novita

Related Posts

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Discover more from Novita