Novita offers highly competitive pricing in the market.
For example, a Wan 2.1 I2V 720P 5-second video costs only $0.3 per video, a Wan 2.1 I2V 480P 5-second video costs only $0.2 per video!
Currently supports up to 3 Loras!
Start a free trial on Novita AI today. To integrate Wan 2.1 API, visit our developer docs for more details.
Wan 2.1 I2V (Image-to-Video) is a cutting-edge video generation model that combines state-of-the-art technologies like Wan-VAE and Video Diffusion DiT. It excels at high-fidelity video reconstruction, efficient compression, and seamless text-to-video generation, supported by a robust and clean training dataset.
Wan2.1 I2V Ability

Key Innovations of Wan 2.1
1. Wan-VAE
Overview
- A 3D variational autoencoder (VAE) designed for efficient compression and high-fidelity motion reproduction.
- Capable of encoding and decoding 1080P videos while maintaining temporal coherence.
- Integrates multiple strategies to optimize spatio-temporal compression, reduce memory usage, and ensure temporal causality.
Problems Addressed
- Efficient Compression: Reduces storage and computational requirements for video data.
- High-fidelity Reconstruction: Ensures the generated videos are of high quality and motion is coherent.
- Temporal Consistency: Avoids common issues like frame discontinuity or jitter in generated videos.
2. Video Diffusion DiT
Overview
- Built on Diffusion Transformers, enhanced by the Flow Matching framework.
- Supports multilingual text input (via T5 Encoder) and text embedding (cross-attention).
- Uses a shared MLP to predict modulation parameters for time embeddings, enabling each transformer block to learn distinct biases, thereby improving performance.
Problems Addressed
- Deep Integration of Text and Video Generation: Allows the model to better understand and generate videos according to textual descriptions.
- Improved Generation Performance: Significantly enhances the quality and expressiveness of generated videos without increasing parameter count.
- Multimodal Support: Handles multiple languages and input types, broadening application scenarios.
3. Candidate Dataset
Overview
- A large-scale, curated, and deduplicated dataset of images and videos.
- Employs a four-step data cleaning process, focusing on data dimensions, visual quality, and motion quality.
- Builds a diverse and high-quality training set.
Problems Addressed
- Data Noise and Redundancy: Effectively removes low-quality or duplicate data, improving the effectiveness of training data.
- Diversity and Quality: Provides the model with rich and clean samples, enhancing generalization and generation capabilities.
- Large-scale Training: Supports efficient training on large, high-quality datasets.
Vbench of Wan 2.1
Wan 2.1 (Wan-14B) demonstrates excellent performance in core tasks such as ID consistency, physical plausibility, and smoothness. Its overall weighted score is among the highest in the industry, making it one of the leading video generation models available today. However, there is still room for improvement in areas like stylization ability and camera control.

Hardware Requirements of Wan 2.1
Wan 2.1 has high hardware requirements, especially for high-resolution and large-model tasks. The memory requirement for Wan 2.1 I2V approaches 80GB. It is recommended to use multiple high-end, data center-grade GPUs (such as A100, H100, or H20) to meet the memory and speed demands. Consumer-grade GPUs are only suitable for small models and low-resolution scenarios.
| Model | Single Card Compatible | Multi-GPU Recommendation | Recommendation Level |
|---|---|---|---|
| RTX 4090 | No | No | only for T2V-1.3B at 480P |
| H20 | Not Supported | 4 GPUs or 8 GPUs | ★★★ |
| A800/A100 | Supported | 4 GPUs or 8 GPUs | ★★★★ |
| H800/H100 | Supported | 4 GPUs or 8 GPUs | ★★★★★ |
How to Access Wan 2.1 via Novita AI?
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.

Step 2: Choose Your Model
Browse through the available options and select the model that suits your needs.

Step 3 : Start your Free Trail

Step 4: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the page, you can copy the API key as indicated in the image.

Step 5: Install the API
Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
import requests
url = "https://api.novita.ai/v3/async/wan-i2v"
payload = {
"extra": {"webhook": {
"url": "<string>",
"test_mode": {
"enabled": True,
"return_task_status": "<string>"
}
}},
"model_name": "<string>",
"image_url": "<string>",
"width": 123,
"height": 123,
"loras": [
{
"path": "<string>",
"scale": {}
}
],
"seed": 123,
"prompt": "<string>",
"negative_prompt": "<string>",
"steps": 123,
"guidance_scale": 123,
"flow_shift": 123,
"enable_safety_checker": True
}
headers = {
"Content-Type": "<content-type>",
"Authorization": "<authorization>"
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)
Wan 2.1 I2V (Image-to-Video) is a cutting-edge video generation model that combines state-of-the-art technologies like Wan-VAE and Video Diffusion DiT. It excels at high-fidelity video reconstruction, efficient compression, and seamless text-to-video generation, supported by a robust and clean training dataset.
Frequently Asked Questions
Wan 2.1 I2V is an advanced model for generating high-quality videos from textual or image inputs. Its uniqueness lies in its high-fidelity motion reproduction, temporal consistency, and multilingual support for text-to-video generation.
Competitive pricing: $0.40 per 5-second 720P video compared to $2.39 on similar platforms.
Easy-to-use API with detailed documentation for developers.
You can use Wan 2.1 I2V via the Novita AI platform. Simply log in, select the model, obtain your API key, and integrate the API into your development environment.
Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.
Recommend Reading
- Wan2.1 vs HunyuanVideo: Architecture, Efficiency, and Quality
- Wan2.1 vs Mochi 1: Open-Source AI Video Generation Models’war
- Transforming Images with Ease: Image to Video AI API
Discover more from Novita
Subscribe to get the latest posts sent to your email.





