Wan 2.1 14B Image-to-Video is live on Novita AI at $0.04/sec!

WAN 2.1 I2V ON NOVITA AI

Novita offers highly competitive pricing in the market.

For example, a Wan 2.1 I2V 720P 5-second video costs only $0.3 per video, a Wan 2.1 I2V 480P 5-second video costs only $0.2 per video!

Currently supports up to 3 Loras!

Start a free trial on Novita AI today. To integrate Wan 2.1 API, visit our developer docs for more details.

Wan 2.1 I2V (Image-to-Video) is a cutting-edge video generation model that combines state-of-the-art technologies like Wan-VAE and Video Diffusion DiT. It excels at high-fidelity video reconstruction, efficient compression, and seamless text-to-video generation, supported by a robust and clean training dataset.

Wan2.1 I2V Ability

Key Innovations of Wan 2.1

1. Wan-VAE

Overview

  • A 3D variational autoencoder (VAE) designed for efficient compression and high-fidelity motion reproduction.
  • Capable of encoding and decoding 1080P videos while maintaining temporal coherence.
  • Integrates multiple strategies to optimize spatio-temporal compression, reduce memory usage, and ensure temporal causality.

Problems Addressed

  • Efficient Compression: Reduces storage and computational requirements for video data.
  • High-fidelity Reconstruction: Ensures the generated videos are of high quality and motion is coherent.
  • Temporal Consistency: Avoids common issues like frame discontinuity or jitter in generated videos.

2. Video Diffusion DiT

Overview

  • Built on Diffusion Transformers, enhanced by the Flow Matching framework.
  • Supports multilingual text input (via T5 Encoder) and text embedding (cross-attention).
  • Uses a shared MLP to predict modulation parameters for time embeddings, enabling each transformer block to learn distinct biases, thereby improving performance.

Problems Addressed

  • Deep Integration of Text and Video Generation: Allows the model to better understand and generate videos according to textual descriptions.
  • Improved Generation Performance: Significantly enhances the quality and expressiveness of generated videos without increasing parameter count.
  • Multimodal Support: Handles multiple languages and input types, broadening application scenarios.

3. Candidate Dataset

Overview

  • A large-scale, curated, and deduplicated dataset of images and videos.
  • Employs a four-step data cleaning process, focusing on data dimensions, visual quality, and motion quality.
  • Builds a diverse and high-quality training set.

Problems Addressed

  • Data Noise and Redundancy: Effectively removes low-quality or duplicate data, improving the effectiveness of training data.
  • Diversity and Quality: Provides the model with rich and clean samples, enhancing generalization and generation capabilities.
  • Large-scale Training: Supports efficient training on large, high-quality datasets.

Vbench of Wan 2.1

Wan 2.1 (Wan-14B) demonstrates excellent performance in core tasks such as ID consistency, physical plausibility, and smoothness. Its overall weighted score is among the highest in the industry, making it one of the leading video generation models available today. However, there is still room for improvement in areas like stylization ability and camera control.

wanbench

Hardware Requirements of Wan 2.1

Wan 2.1 has high hardware requirements, especially for high-resolution and large-model tasks. The memory requirement for Wan 2.1 I2V approaches 80GB. It is recommended to use multiple high-end, data center-grade GPUs (such as A100, H100, or H20) to meet the memory and speed demands. Consumer-grade GPUs are only suitable for small models and low-resolution scenarios.

Model Single Card Compatible Multi-GPU Recommendation Recommendation Level
RTX 4090 No No only for T2V-1.3B at 480P
H20 Not Supported 4 GPUs or 8 GPUs ★★★
A800/A100 Supported 4 GPUs or 8 GPUs ★★★★
H800/H100 Supported 4 GPUs or 8 GPUs ★★★★★

How to Access Wan 2.1 via Novita AI?

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

choose your model

Step 3 : Start your Free Trail

try wan 2.1 i2v api

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the page, you can copy the API key as indicated in the image.

get api key

Step 5: Install the API

Install API using the package manager specific to your programming language.

install api

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

import requests

url = "https://api.novita.ai/v3/async/wan-i2v"

payload = {
    "extra": {"webhook": {
            "url": "<string>",
            "test_mode": {
                "enabled": True,
                "return_task_status": "<string>"
            }
        }},
    "model_name": "<string>",
    "image_url": "<string>",
    "width": 123,
    "height": 123,
    "loras": [
        {
            "path": "<string>",
            "scale": {}
        }
    ],
    "seed": 123,
    "prompt": "<string>",
    "negative_prompt": "<string>",
    "steps": 123,
    "guidance_scale": 123,
    "flow_shift": 123,
    "enable_safety_checker": True
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

Wan 2.1 I2V (Image-to-Video) is a cutting-edge video generation model that combines state-of-the-art technologies like Wan-VAE and Video Diffusion DiT. It excels at high-fidelity video reconstruction, efficient compression, and seamless text-to-video generation, supported by a robust and clean training dataset.

Frequently Asked Questions

What are the hardware requirements for Wan 2.1 I2V?

Wan 2.1 I2V is an advanced model for generating high-quality videos from textual or image inputs. Its uniqueness lies in its high-fidelity motion reproduction, temporal consistency, and multilingual support for text-to-video generation.

What is Wan 2.1 I2V, and what makes it unique?

Competitive pricing: $0.40 per 5-second 720P video compared to $2.39 on similar platforms.
Easy-to-use API with detailed documentation for developers.

How can I access Wan 2.1 I2V?

You can use Wan 2.1 I2V via the Novita AI platform. Simply log in, select the model, obtain your API key, and integrate the API into your development environment.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Recommend Reading


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading