Wan2.6 on Novita AI: Cinematic Creation Model with Role-Playing & Multi-Shot Control

Wan 2.6 API on Novita AI

Wan2.6 represents a breakthrough in AI video generation, offering the world’s most comprehensive video creation capabilities. The model includes role-playing, multi-shot control, and audio-visual synchronization features that set it apart from competitors.

Now available on Novita AI’s Model API platform, developers and businesses can access this cutting-edge model through simple API integration without managing complex infrastructure.

This guide explores how to leverage Wan2.6 on Novita AI for text-to-video, image-to-video, and reference video generation.

What is Wan2.6?

This video is generated by Wan2.6

Wan2.6 is the latest generation of Alibaba Cloud’s video generation model series, specifically designed for professional film production and creative content scenarios.

As the world’s most feature-complete video generation model, Wan2.6 introduces revolutionary capabilities that bridge the gap between amateur content creation and professional cinematography.

Core Technology

Wan2.6 employs advanced multimodal joint modeling to process reference videos. The system extracts temporal information about subject emotions, poses, and comprehensive visual features from multiple angles.

The model simultaneously captures acoustic characteristics, including voice timbre and speech rate. These elements serve as control conditions during generation to maintain complete sensory consistency from visuals to audio.

Technical Innovations

The model integrates several breakthrough technologies:

  • Multimodal Learning: Processes visual, audio, and temporal data simultaneously for coherent output
  • High-Level Semantic Understanding: Transforms simple prompts into professional multi-shot narratives with complete storylines
  • Unified Modeling: Maintains consistency in core subjects, scene layouts, and environmental atmosphere across shot transitions
  • Audio-Visual Synchronization: Ensures perfect lip-sync and audio alignment with visual content

Key Features of Wan2.6

1. Role-Playing Capability

Wan2.6’s signature feature allows users to upload personal videos and transform themselves into characters in professional-quality scenes.

The model handles:

  • Single and Multi-Character Performances: Supports solo performances or group interactions
  • Emotion and Gesture Transfer: Captures and replicates nuanced expressions and movements
  • Cross-Style Transformation: Applies different genres (sci-fi, suspense, romance) to source footage
  • Professional Acting Simulation: Generates cinema-quality performances from ordinary user videos

2. Multi-Shot Control and Transitions

The model excels at professional-grade shot composition and transitions:

  • Automatic Shot Planning: Converts simple prompts into multi-shot scripts
  • Seamless Transitions: Smooth cuts between different camera angles and perspectives
  • Narrative Coherence: Maintains story continuity across multiple shots
  • Consistency Preservation: Keeps characters, settings, and atmosphere unified throughout

3. Extended Video Duration

Wan2.6 supports up to 15 seconds per generation—the longest single-generation duration available in China’s AI video market.

This extended duration enables more complex storytelling and complete scene development without requiring multiple generations and stitching.

4. Audio-Visual Synchronization

Perfect alignment between audio and visual elements:

  • Lip-Sync Accuracy: Precise mouth movement matching for dialogue
  • Sound-Driven Animation: Audio cues drive character movements and expressions
  • Environmental Audio: Contextually appropriate background sounds and effects

5. Enhanced Quality Metrics

Recent upgrades have significantly improved multiple aspects of the model:

  • Improved Visual Fidelity: Higher resolution and detail quality
  • Better Audio Effects: Professional-grade sound design
  • Superior Prompt Following: More accurate interpretation of complex instructions
  • Cinematic Camera Work: Professional cinematography techniques applied automatically

Wan2.6 Model Variants on Novita AI

Novita AI provides three distinct API endpoints for Wan2.6, each optimized for specific use cases and accessible through the Model API platform.

Text-to-Video (T2V)

Generate videos directly from text prompts without requiring input images or videos.

Ideal for creating original content from creative descriptions with multi-shot control and narrative sequencing.

Key Capabilities:

  • Multi-shot narrative generation from sequential prompts
  • Automatic shot type selection and camera movements
  • Cinematic transitions between scenes
  • Support for 5, 10, and 15-second video durations

Technical Specifications:

ParameterSupported ValuesNotes
Duration5s, 10s, 15sChoose based on content complexity
Resolution1280×720, 720×1280, 960×960, 1088×832, 832×1088, 1920×1080, 1080×1920, 1440×1440, 1632×1248, 1248×1632Does not support 480P
Model IDwan2.6-t2vUse this identifier in API calls

Learn more: Wan2.6 Text-to-Video API Documentation

Image-to-Video (I2V)

Animate static images into dynamic video sequences.

Perfect for bringing product photos, illustrations, or concept art to life with controlled motion and narrative context.

Key Capabilities:

  • Motion strength control for animation intensity
  • Multiple resolution options for different use cases
  • Prompt-guided animation direction
  • Character and object animation

Technical Specifications:

ParameterSupported ValuesNotes
Duration5s, 10s, 15sExtended duration for complex animations
Resolution1080P, 720PDoes not support 480P
Model IDwan2.6-i2vUse this identifier in API calls

Learn more: Wan2.6 Image-to-Video API Documentation

Reference Video (R2V)

Transform existing videos with style transfer, role-playing, or scene modifications using reference video input.

Key Capabilities:

  • Role-playing and character replacement
  • Style transfer across visual genres
  • Audio-visual synchronization preservation
  • Multi-reference video support (1-2 videos recommended)

Technical Specifications:

ParameterSupported Values
Duration5s, 10s (does not support 15s)
Resolution1280×720, 720×1280, 960×960, 1088×832, 832×1088, 1920×1080, 1080×1920, 1440×1440, 1632×1248, 1248×1632(no 480P)
Video FormatMP4, MOV
File Size< 30MB per file
Single ReferenceMax 5s duration
Dual ReferenceMax 2.5s each (3 videos not recommended)
Model IDwan2.6-v2v

Important Notes: Reference videos cannot be uploaded simultaneously with audio files. The reference_video_urls parameter accepts an array of video URLs.

Learn more: Wan2.6 Reference Video API Documentation

Getting Started with Wan2.6 on Novita AI

Prerequisites

Before you begin, ensure you have:

  1. Novita AI Account: Sign up at novita.ai. Get $1 Free Credits automatically upon registration
  2. API Key: Get it from your console
  3. Development Environment: Python, Node.js, or any HTTP client

Asynchronous Request Flow

Wan2.6 on Novita AI uses an asynchronous processing model to handle generation requests efficiently:

  1. Submit Request: POST to the appropriate endpoint with your parameters
  2. Receive Task ID: API returns a task_id immediately
  3. Poll for Results: Use the task ID to check generation status
  4. Retrieve Output: Download the generated video once complete

Text-to-Video Generation Example

Here’s a complete example of generating a video from text using Wan2.6’s T2V API:

Step 1: Submit Generation Request

import requests

url = "https://api.novita.ai/v3/async/wan2.6-t2v"

payload = {
    "input": {
        "prompt": "<string>",
        "audio_url": "<string>",
        "negative_prompt": "<string>"
    },
    "parameters": {
        "seed": 123,
        "size": "<string>",
        "audio": True,
        "duration": 123,
        "shot_type": "<string>",
        "watermark": True,
        "prompt_extend": True
    }
}
headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)

Step 2: Get the video generation results

import requests

url = "https://api.novita.ai/v3/async/task-result"

headers = {
    "Content-Type": "<content-type>",
    "Authorization": "<authorization>"
}

response = requests.get(url, headers=headers)

print(response.text)

Key Parameters Explained

ParameterDescriptionOptions
promptText description of desired videoDetailed scene description
audio_urlOptional audio file for syncHTTPS URL to audio file
negative_promptElements to avoidQuality issues, unwanted objects
seedRandom seed for reproducibilityAny integer
sizeVideo resolution“1280×720”, “1920×1080”, “720×1280”, etc.
durationVideo length in seconds5, 10, or 15
shot_typeCamera angle“wide_shot”, “medium_shot”, “close_up”
prompt_extendAuto-enhance prompttrue/false
watermarkAdd watermark to videotrue/false
audioEnable audio generationtrue/false

For complete API specifications and additional parameters, visit the Wan2.6 API Documentation.

Multi-Shot Prompt Structure

Wan2.6’s multi-shot capability enables you to create cohesive narrative sequences with multiple camera angles and scenes. To maximize the quality of multi-shot videos, follow this structured prompt format.

Prompt Structure Formula

Prompt = Overall Description + Shot Number + Timestamp + Shot Content

Component Breakdown

1. Overall Description

Provide a brief overview of the entire video content. This section should describe:

  • Story theme and narrative style
  • Main emotions or core events
  • Overall tone and atmosphere

This helps the AI understand the global narrative direction and maintain consistency across shots.

2. Shot Number

Assign a sequential number to each shot to:

  • Distinguish different scenes or segments
  • Organize video structure clearly
  • Maintain logical flow between transitions

3. Timestamp

Specify the exact time range for each shot within the video timeline:

  • Ensures content aligns with video timing
  • Improves generation accuracy
  • Helps with precise shot duration control

4. Shot Content

Provide detailed descriptions of each shot, including:

  • Main characters or objects and their specific behaviors
  • Actions, dialogue, expressions, and gestures
  • Camera angles and movements
  • Lighting and atmosphere details

Follow standard single-shot prompt writing conventions for this section.

Multi-Shot Prompt Example

Here’s a practical example demonstrating the complete structure:

This story is told from a third-person perspective, depicting a short drama about abandonment and the rekindling of hope.

Shot 1 [0-3 seconds]: A boy sits alone in the corner of a playground, head down, looking at a letter in his hands. He lets out a soft sigh, his eyes revealing confusion and uncertainty.

Shot 2 [3-5 seconds]: Hard cut transition, fixed camera position, focusing on the boy's eyes. Tears glisten, conveying a sense of loss and helplessness.

Shot 3 [5-10 seconds]: Hard cut transition, scene shifts to a simple classroom. A girl with gentle yet determined eyes, wearing modest clothing, approaches the boy with a warm and reassuring smile to comfort him.

Conclusion

Wan2.6 on Novita AI democratizes professional video production, offering unprecedented creative control through role-playing, multi-shot narratives, and audio-visual synchronization.

Whether you’re a developer building video generation features, a marketer creating campaign content, or a filmmaker exploring pre-visualization, Novita AI’s Model API platform eliminates infrastructure complexity while delivering cinema-quality results.

Start generating professional videos today and transform your creative vision into reality within minutes.

Ready to get started? Create your Novita AI account and access Wan2.6 with free credits to experience the future of AI video generation.

Novita AI is a leading AI cloud platform that provides developers with easy-to-use APIs and affordable, reliable GPU infrastructure for building and scaling AI applications.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading