Wan2.6 on Novita AI: Cinematic Creation Model with Role-Playing & Multi-Shot Control

Table Of Contents

What is Wan2.6?
Key Features of Wan2.6
Wan2.6 Model Variants on Novita AI
Getting Started with Wan2.6 on Novita AI
Text-to-Video Generation Example
Multi-Shot Prompt Structure
Conclusion

Wan2.6 represents a breakthrough in AI video generation, offering the world’s most comprehensive video creation capabilities. The model includes role-playing, multi-shot control, and audio-visual synchronization features that set it apart from competitors.

Now available on Novita AI’s Model API platform, developers and businesses can access this cutting-edge model through simple API integration without managing complex infrastructure.

This guide explores how to leverage Wan2.6 on Novita AI for text-to-video, image-to-video, and reference video generation.

Try Wan2.6 in the Novita AI Playground

What is Wan2.6?

This video is generated by Wan2.6

Wan2.6 is the latest generation of Alibaba Cloud’s video generation model series, specifically designed for professional film production and creative content scenarios.

As the world’s most feature-complete video generation model, Wan2.6 introduces revolutionary capabilities that bridge the gap between amateur content creation and professional cinematography.

Core Technology

Wan2.6 employs advanced multimodal joint modeling to process reference videos. The system extracts temporal information about subject emotions, poses, and comprehensive visual features from multiple angles.

The model simultaneously captures acoustic characteristics, including voice timbre and speech rate. These elements serve as control conditions during generation to maintain complete sensory consistency from visuals to audio.

Technical Innovations

The model integrates several breakthrough technologies:

Multimodal Learning: Processes visual, audio, and temporal data simultaneously for coherent output
High-Level Semantic Understanding: Transforms simple prompts into professional multi-shot narratives with complete storylines
Unified Modeling: Maintains consistency in core subjects, scene layouts, and environmental atmosphere across shot transitions
Audio-Visual Synchronization: Ensures perfect lip-sync and audio alignment with visual content

Key Features of Wan2.6

1. Role-Playing Capability

Wan2.6’s signature feature allows users to upload personal videos and transform themselves into characters in professional-quality scenes.

The model handles:

Single and Multi-Character Performances: Supports solo performances or group interactions
Emotion and Gesture Transfer: Captures and replicates nuanced expressions and movements
Cross-Style Transformation: Applies different genres (sci-fi, suspense, romance) to source footage
Professional Acting Simulation: Generates cinema-quality performances from ordinary user videos

2. Multi-Shot Control and Transitions

The model excels at professional-grade shot composition and transitions:

Automatic Shot Planning: Converts simple prompts into multi-shot scripts
Seamless Transitions: Smooth cuts between different camera angles and perspectives
Narrative Coherence: Maintains story continuity across multiple shots
Consistency Preservation: Keeps characters, settings, and atmosphere unified throughout

3. Extended Video Duration

Wan2.6 supports up to 15 seconds per generation—the longest single-generation duration available in China’s AI video market.

This extended duration enables more complex storytelling and complete scene development without requiring multiple generations and stitching.

4. Audio-Visual Synchronization

Perfect alignment between audio and visual elements:

Lip-Sync Accuracy: Precise mouth movement matching for dialogue
Sound-Driven Animation: Audio cues drive character movements and expressions
Environmental Audio: Contextually appropriate background sounds and effects

5. Enhanced Quality Metrics

Recent upgrades have significantly improved multiple aspects of the model:

Improved Visual Fidelity: Higher resolution and detail quality
Better Audio Effects: Professional-grade sound design
Superior Prompt Following: More accurate interpretation of complex instructions
Cinematic Camera Work: Professional cinematography techniques applied automatically

Wan2.6 Model Variants on Novita AI

Novita AI provides three distinct API endpoints for Wan2.6, each optimized for specific use cases and accessible through the Model API platform.

Text-to-Video (T2V)

Generate videos directly from text prompts without requiring input images or videos.

Ideal for creating original content from creative descriptions with multi-shot control and narrative sequencing.

Key Capabilities:

Multi-shot narrative generation from sequential prompts
Automatic shot type selection and camera movements
Cinematic transitions between scenes
Support for 5, 10, and 15-second video durations

Technical Specifications:

Parameter	Supported Values	Notes
Duration	5s, 10s, 15s	Choose based on content complexity
Resolution	1280×720, 720×1280, 960×960, 1088×832, 832×1088, 1920×1080, 1080×1920, 1440×1440, 1632×1248, 1248×1632	Does not support 480P
Model ID	`wan2.6-t2v`	Use this identifier in API calls

Learn more: Wan2.6 Text-to-Video API Documentation

Image-to-Video (I2V)

Animate static images into dynamic video sequences.

Perfect for bringing product photos, illustrations, or concept art to life with controlled motion and narrative context.

Key Capabilities:

Motion strength control for animation intensity
Multiple resolution options for different use cases
Prompt-guided animation direction
Character and object animation

Technical Specifications:

Parameter	Supported Values	Notes
Duration	5s, 10s, 15s	Extended duration for complex animations
Resolution	1080P, 720P	Does not support 480P
Model ID	`wan2.6-i2v`	Use this identifier in API calls

Learn more: Wan2.6 Image-to-Video API Documentation

Reference Video (R2V)

Transform existing videos with style transfer, role-playing, or scene modifications using reference video input.

Key Capabilities:

Role-playing and character replacement
Style transfer across visual genres
Audio-visual synchronization preservation
Multi-reference video support (1-2 videos recommended)

Technical Specifications:

Parameter	Supported Values
Duration	5s, 10s (does not support 15s)
Resolution	1280×720, 720×1280, 960×960, 1088×832, 832×1088, 1920×1080, 1080×1920, 1440×1440, 1632×1248, 1248×1632(no 480P)
Video Format	MP4, MOV
File Size	< 30MB per file
Single Reference	Max 5s duration
Dual Reference	Max 2.5s each (3 videos not recommended)
Model ID	`wan2.6-v2v`

Important Notes: Reference videos cannot be uploaded simultaneously with audio files. The reference_video_urls parameter accepts an array of video URLs.

Learn more: Wan2.6 Reference Video API Documentation

Getting Started with Wan2.6 on Novita AI

Prerequisites

Before you begin, ensure you have:

Novita AI Account: Sign up at novita.ai. Get $1 Free Credits automatically upon registration
API Key: Get it from your console
Development Environment: Python, Node.js, or any HTTP client

Asynchronous Request Flow

Wan2.6 on Novita AI uses an asynchronous processing model to handle generation requests efficiently:

Submit Request: POST to the appropriate endpoint with your parameters
Receive Task ID: API returns a task_id immediately
Poll for Results: Use the task ID to check generation status
Retrieve Output: Download the generated video once complete

Text-to-Video Generation Example

Here’s a complete example of generating a video from text using Wan2.6’s T2V API:

Step 1: Submit Generation Request

import requests

url = "https://api.novita.ai/v3/async/wan2.6-t2v"

payload = {
    "input": {
        "prompt": "<string>",
        "audio_url": "<string>",
        "negative_prompt": "<string>"
    },
    "parameters": {
        "seed": 123,
        "size": "<string>",
        "audio": True,
        "duration": 123,
        "shot_type": "<string>",
        "watermark": True,
        "prompt_extend": True
    }
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

Step 2: Get the video generation results

import requests

url = "https://api.novita.ai/v3/async/task-result"

headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.get(url, headers=headers)

print(response.text)

Key Parameters Explained

Parameter	Description	Options
`prompt`	Text description of desired video	Detailed scene description
`audio_url`	Optional audio file for sync	HTTPS URL to audio file
`negative_prompt`	Elements to avoid	Quality issues, unwanted objects
`seed`	Random seed for reproducibility	Any integer
`size`	Video resolution	”1280x720”, “1920x1080”, “720x1280”, etc.
`duration`	Video length in seconds	5, 10, or 15
`shot_type`	Camera angle	”wide_shot”, “medium_shot”, “close_up”
`prompt_extend`	Auto-enhance prompt	true/false
`watermark`	Add watermark to video	true/false
`audio`	Enable audio generation	true/false

For complete API specifications and additional parameters, visit the Wan2.6 API Documentation.

Multi-Shot Prompt Structure

Wan2.6’s multi-shot capability enables you to create cohesive narrative sequences with multiple camera angles and scenes. To maximize the quality of multi-shot videos, follow this structured prompt format.

Prompt Structure Formula

Prompt = Overall Description + Shot Number + Timestamp + Shot Content

Component Breakdown

1. Overall Description

Provide a brief overview of the entire video content. This section should describe:

Story theme and narrative style
Main emotions or core events
Overall tone and atmosphere

This helps the AI understand the global narrative direction and maintain consistency across shots.

2. Shot Number

Assign a sequential number to each shot to:

Distinguish different scenes or segments
Organize video structure clearly
Maintain logical flow between transitions

3. Timestamp

Specify the exact time range for each shot within the video timeline:

Ensures content aligns with video timing
Improves generation accuracy
Helps with precise shot duration control

4. Shot Content

Provide detailed descriptions of each shot, including:

Main characters or objects and their specific behaviors
Actions, dialogue, expressions, and gestures
Camera angles and movements
Lighting and atmosphere details

Follow standard single-shot prompt writing conventions for this section.

Multi-Shot Prompt Example

Here’s a practical example demonstrating the complete structure:

This story is told from a third-person perspective, depicting a short drama about abandonment and the rekindling of hope.

Shot 1 [0-3 seconds]: A boy sits alone in the corner of a playground, head down, looking at a letter in his hands. He lets out a soft sigh, his eyes revealing confusion and uncertainty.

Shot 2 [3-5 seconds]: Hard cut transition, fixed camera position, focusing on the boy's eyes. Tears glisten, conveying a sense of loss and helplessness.

Shot 3 [5-10 seconds]: Hard cut transition, scene shifts to a simple classroom. A girl with gentle yet determined eyes, wearing modest clothing, approaches the boy with a warm and reassuring smile to comfort him.

Conclusion

Wan2.6 on Novita AI democratizes professional video production, offering unprecedented creative control through role-playing, multi-shot narratives, and audio-visual synchronization.

Whether you’re a developer building video generation features, a marketer creating campaign content, or a filmmaker exploring pre-visualization, Novita AI’s Model API platform eliminates infrastructure complexity while delivering cinema-quality results.

Start generating professional videos today and transform your creative vision into reality within minutes.

Ready to get started? Create your Novita AI account and access Wan2.6 with free credits to experience the future of AI video generation.

Novita AI is a leading AI cloud platform that provides developers with easy-to-use APIs and affordable, reliable GPU infrastructure for building and scaling AI applications.

Wan2.6 on Novita AI: Cinematic Creation Model with Role-Playing & Multi-Shot Control

What is Wan2.6?

Core Technology

Technical Innovations

Key Features of Wan2.6

1. Role-Playing Capability

2. Multi-Shot Control and Transitions

3. Extended Video Duration

4. Audio-Visual Synchronization

5. Enhanced Quality Metrics

Wan2.6 Model Variants on Novita AI

Text-to-Video (T2V)

Image-to-Video (I2V)

Reference Video (R2V)

Getting Started with Wan2.6 on Novita AI

Prerequisites

Asynchronous Request Flow

Text-to-Video Generation Example

Step 1: Submit Generation Request

Step 2: Get the video generation results

Key Parameters Explained

Multi-Shot Prompt Structure

Prompt Structure Formula

Component Breakdown

Multi-Shot Prompt Example

Conclusion

Product

RESOURCES

Partners

Company

What is Wan2.6?

Core Technology

Technical Innovations

Key Features of Wan2.6

1. Role-Playing Capability

2. Multi-Shot Control and Transitions

3. Extended Video Duration

4. Audio-Visual Synchronization

5. Enhanced Quality Metrics

Wan2.6 Model Variants on Novita AI

Text-to-Video (T2V)

Image-to-Video (I2V)

Reference Video (R2V)

Getting Started with Wan2.6 on Novita AI

Prerequisites

Asynchronous Request Flow

Text-to-Video Generation Example

Step 1: Submit Generation Request

Step 2: Get the video generation results

Key Parameters Explained

Multi-Shot Prompt Structure

Prompt Structure Formula

Component Breakdown

Multi-Shot Prompt Example

Conclusion

Related Posts

Product

RESOURCES

Partners

Company