Wan 2.2 T2V on Novita AI: What’s New and Why It Matters

Table Of Contents

What is Wan 2.2 T2V?
Wan 2.2 T2V vs Wan 2.1 T2V
Cost and Access of Wan 2.2 T2V
Wan 2.2 T2V Access Guide
Pros and Cons of Wan 2.2 T2V for Small Businesses
Future Trends in Wan 2.2 T2V Technology
Frequently Asked Questions

Novita AI has officially launched the latest Wan 2.2 API, a cutting-edge tool for text-to-video generation. This article will introduce what Wan 2.2 is, highlight its new features and updates, and discuss its performance. Additionally, we’ll address common questions to help you get started with this powerful technology.

What is Wan 2.2 T2V?

Wan 2.2 T2V is Alibaba’s latest open-source text-to-video generative AI model, representing a major upgrade over the earlier Wan 2.1 system. It’s part of Alibaba’s “Wan” series of video generation models (often referred to as Tongyi Wanxiang in Chinese) and is notable for being the industry’s first open-source video model that uses a Mixture-of-Experts (MoE) architecture. Wan 2.2 actually encompasses a suite of models, including a dedicated text-to-video model and related tools, but “Wan 2.2 T2V” specifically refers to the text-to-video component of this series.

Wan 2.2 T2V Specifications

Category	Description
Model Architecture	Uses a Mix-of-Experts architecture with two expert sub-models.
Parameter Count	The total model has 27 billion parameters, but only 14 billion are active during inference.
Design Advantages	By using specialized “experts” (each around 14B parameters), the model doubles in size while maintaining similar runtime costs compared to its predecessor, Wan 2.1 (14B parameters).
Released Model Variants	1. T2V-A14B: A text-to-video model for generating videos from text. 2.TI2V-5B: A hybrid model for both tasks, optimized for consumer-grade hardware (5B parameters).
Hardware Optimization	TI2V-5B is optimized for consumer-grade GPUs, such as running on a single NVIDIA RTX 4090.
Resolution and Frame Rate	The standard Wan 2.2 T2V model can generate 5-second-long videos at 720p resolution (1280×720) with 24 frames per second.

Wan 2.2 T2V Key Features

Cinematic Quality & Control

Trained on a meticulously curated dataset with aesthetic labels to generate videos with a cinematic look and feel.
Supports fine-grained text control, allowing users to specify:
- Lighting conditions
- Time of day
- Color tone
- Camera angles
- Focal length
- Other cinematic aspects.
Understands cinematic terms such as “golden hour lighting” and “wide-angle lens,” ensuring precise control over the video output.

Multi-Modal Generative Suite

Includes a style transfer functionality:
- Enables one-click application of artistic styles, such as converting photos or videos into cartoon or sketch formats (veo-video.org).
Provides a unified model family that supports various generative tasks, making it a comprehensive creative AI platform.

Open Source & Community Ecosystem

Licensed under Apache 2.0, permitting commercial use (hackernoon.com). Supported by an active community that contributes:

Guides
Integration tools (e.g., for ComfyUI)
Fine-tuning optimizations
General support.

What Work Process Optimizations are in Wan 2.2?

Wan 2.2 T2V vs Wan 2.1 T2V

Wan 2.2 T2V vs Wan 2.1 T2V: Architecture

Aspect	Wan 2.1	Wan 2.2
Architecture	Single-stage Diffusion Transformer (UNet).	Two-stage Mixture-of-Experts (MoE) with High-Noise and Low-Noise Experts.
Parameters	14B (base) and 1.3B (small).	27B total (14B active); 14B T2V, 14B I2V, and 5B hybrid model.
Training Data	Large dataset, less curated.	+65% images, +83% videos, annotated for aesthetics and cinematic attributes.
Output Quality	Good but prone to flickering; suited for simpler, stylized videos.	Higher detail, better temporal consistency, realism, and cinematic visuals.
Features	T2V, I2V, editing (VACE framework), LoRA fine-tuning supported.	T2V, I2V, better style transfer; no VACE yet, limited LoRA compatibility.

Wan 2.2 T2V vs Wan 2.1 T2V: Perfromance

From Artificial Analysis

Wan 2.2 T2V vs Wan 2.1 T2V: Generation

Wan 2.2 T2V

Wan 2.1 T2V

Cost and Access of Wan 2.2 T2V

Hardware Costs

Model	Minimum VRAM Requirement (GB)	Minimum GPU Model	Minimum GPU Quantity	Single GPU Speed (s) (480P)	Single GPU Speed (s) (720P)	Approximate GPU Price (USD)
T2V-5B	22.6	NVIDIA RTX 4090	1	534.7	524.8	$1,599
T2V-A14B	41.3	NVIDIA A100	1	1133.9	4048.7	$10,000 - $15,000

Notes:

NVIDIA RTX 4090: Released in October 2022 with an MSRP of $1,599.
NVIDIA A100: Prices vary based on configuration and market factors. The 40GB PCIe model typically ranges from $10,000 to $12,000, while the 80GB PCIe model ranges from $12,000 to $15,000.

API Costs

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Model	Price	Resolution	Generation Time
Wan 2.1 T2V	$0.3/video	1280*720	5s
Wan 2.2 T2V	$0.4/video	1080P	5s

Try Wan 2.2 Now!

Wan 2.2 T2V Access Guide

Step 1: Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 4: Install the API

Install API using the package manager specific to your programming language.

Try Wan 2.2 Now!

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

import requests

url = "https://api.novita.ai/v3/async/wan-2.2-t2v"

payload = {
    "input": {
        "prompt": "<string>",
        "negative_prompt": "<string>"
    },
    "parameters": {
        "size": "<string>",
        "prompt_extend": True,
        "seed": 123
    }
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Common Wan 2.2 T2V Issues and Fixes

Installation and GPU Compatibility

Issue: Errors on older GPUs (e.g., GTX 10-series) due to FlashAttention.
Solution: Use compatible GPUs like RTX 30/40-series or A-series. Alternatively, disable FlashAttention (--disable_flashattn) or replace it with xFormers for slower but functional performance.

Slow Generation Speed

Issue: Extremely slow output, especially on modest GPUs.
Solution:
- Optimize step count (30–50 steps are often sufficient).
- Use the smaller TI2V-5B model for faster results.
- Ensure correct expert switching settings (default configurations are recommended).

Output Quality Issues (Flicker/Artifacts)

Issue: Flickering frames or artifacts in generated videos.
Solution:
- Adjust CFG scale for better balance between precision and smoothness.
- Tweak the expert handover step for optimal diffusion.
- Enable temporal attention to maintain frame consistency.
- Use post-processing tools like frame interpolation if needed.

Prompt or Output Not as Expected

Issue: Outputs differ from the described scenes or include unwanted elements.
Solution:
- Rephrase and simplify prompts.
- Use negative prompts to exclude specific elements.
- Ensure correct model weights (e.g., don’t use I2V for text-only prompts).

LoRA and Fine-tuning Issues

Issue: Old LoRA models from Wan 2.1 are incompatible with Wan 2.2.
Solution: Wait for Wan 2.2-specific LoRAs or fine-tunes. Ensure any fine-tuning is tailored to the new two-expert architecture.

Pros and Cons of Wan 2.2 T2V for Small Businesses

Aspect	Advantages	Disadvantages
Licensing & Cost	Free under Apache 2.0, no licensing fees, drastically lowers entry costs.	High computational costs for large-scale usage (cloud or electricity).
Content Quality	Cinematic-quality videos; in-house creation without hiring designers or videographers.	Unpredictable output quality; may require manual review and editing.
Creative Flexibility	Rapid prototyping with text prompts; quick turnaround for concept videos.	Slower for real-time or on-demand generation; better for pre-planned content.
Customization	Tailored to brand aesthetics via prompts or fine-tuning; open-source flexibility for deeper integration.	Requires expertise to craft prompts or fine-tune models effectively.
Scalability	Generate hundreds of videos easily; ideal for localized ads or A/B testing.	Expensive hardware (e.g., RTX 4090 or A100) needed for high-capacity use.
Community Support	Backed by open-source community; access to tutorials, updates, and tools like ComfyUI workflows.	No formal support or guarantees; reliance on community goodwill for troubleshooting.
Ease of Use	Simplifies video creation for small teams; acts as a “mini creative studio.”	Requires ML knowledge for setup (Python, CUDA, model parameters); steep learning curve.
Ethical & Legal	Enables innovation in AI-driven marketing.	Risks of generating unintended or inappropriate content; potential legal liabilities.

Best for: Small businesses with technical expertise or access to consultants, aiming to reduce content creation costs and scale video production. Challenges: Requires careful planning, technical setup, and monitoring of hardware and costs.

Future Trends in Wan 2.2 T2V Technology

Higher Resolution & Length
- Move towards 1080p, 4K, and longer clips (10–20 seconds).
- Improved coherence for extended videos via hierarchical generation.
Enhanced Motion & Consistency
- Better motion stability and natural interactions.
- Specialized experts for different motion types (e.g., slow vs. fast).
Video Editing & Multi-Modality
- Text commands for editing existing videos (e.g., scene changes or object removal).
- Integration of audio generation for complete video projects.
Efficiency & Scalability
- Smaller, faster models (e.g., distilled 5B models with near 27B quality).
- Real-time video generation becomes feasible with hardware advancements.
Community & Ecosystem Growth
- Niche fine-tunes (e.g., cartoon style, medical videos).
- Wider adoption through plugins and mobile apps.
Ethics & Regulation
- Watermarks and metadata for AI-generated content.
- Standards ensuring transparency in use cases like advertising.

The release of the Wan 2.2 API marks a significant advancement in text-to-video technology. With higher resolutions, enhanced motion consistency, and improved efficiency, Wan 2.2 opens new possibilities for developers and creators. Its flexible API interface empowers you to bring your ideas to life, setting a new standard for video generation.

Frequently Asked Questions

What is Wan 2.2?

Wan 2.2 is an open-source text-to-video model capable of generating high-quality, motion-consistent videos suitable for applications like advertising, filmmaking, and more.

What’s new in Wan 2.2 compared to previous versions?

Support for higher resolutions (up to 1080p).
Improved temporal consistency, reducing flickering.
Introduction of Mixture-of-Experts (MoE) architecture for better handling of complex scenes.

How does Wan 2.2 perform?

Wan 2.2 excels in speed, memory optimization, and output quality. When paired with high-end GPUs, it can quickly generate high-resolution video.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Wan 2.2 T2V on Novita AI: What’s New and Why It Matters

What is Wan 2.2 T2V?

Wan 2.2 T2V Specifications

Wan 2.2 T2V Key Features

What Work Process Optimizations are in Wan 2.2?

Wan 2.2 T2V vs Wan 2.1 T2V

Wan 2.2 T2V vs Wan 2.1 T2V: Architecture

Wan 2.2 T2V vs Wan 2.1 T2V: Perfromance

Wan 2.2 T2V vs Wan 2.1 T2V: Generation

Cost and Access of Wan 2.2 T2V

Hardware Costs

API Costs

Wan 2.2 T2V Access Guide

Common Wan 2.2 T2V Issues and Fixes

Pros and Cons of Wan 2.2 T2V for Small Businesses

Future Trends in Wan 2.2 T2V Technology

Frequently Asked Questions

Product

RESOURCES

Partners

Company

What is Wan 2.2 T2V?

Wan 2.2 T2V Specifications

Wan 2.2 T2V Key Features

What Work Process Optimizations are in Wan 2.2?

Wan 2.2 T2V vs Wan 2.1 T2V

Wan 2.2 T2V vs Wan 2.1 T2V: Architecture

Wan 2.2 T2V vs Wan 2.1 T2V: Perfromance

Wan 2.2 T2V vs Wan 2.1 T2V: Generation

Cost and Access of Wan 2.2 T2V

Hardware Costs

API Costs

Wan 2.2 T2V Access Guide

Common Wan 2.2 T2V Issues and Fixes

Pros and Cons of Wan 2.2 T2V for Small Businesses

Future Trends in Wan 2.2 T2V Technology

Frequently Asked Questions

Recommend Reading

Related Posts

Product

RESOURCES

Partners

Company