Choose Vidu Q3 Turbo first when you need lower cost, fast iteration, or high-volume video tests; choose Vidu Q3 Pro when you are willing to pay the higher per-second price for the Pro variant and want to compare it against Turbo for a final creative pass. On Novita AI, both variants expose text-to-video, image-to-video, and start-end-to-video endpoints, support asynchronous generation, and use the same public per-second pricing pattern across those three modes.
Vidu Q3 Pro vs Turbo selection summary
The clearest source-backed difference between Vidu Q3 Pro and Vidu Q3 Turbo on Novita AI is pricing. The public Novita AI pricing payload lists Turbo at lower per-second rates than Pro for 540p, 720p, and 1080p. The API docs also show that both variants are available through separate asynchronous endpoints for text-to-video, image-to-video, and start-end-to-video.
| Decision point | Start with Vidu Q3 Turbo | Start with Vidu Q3 Pro |
|---|---|---|
| Main goal | Explore prompts, run more variants, reduce per-second spend | Compare the Pro variant for final candidate clips |
| Budget profile | Lower peak and off-peak prices at every listed resolution | Higher per-second prices at every listed resolution |
| API modes on Novita AI | Text-to-video, image-to-video, start-end-to-video | Text-to-video, image-to-video, start-end-to-video |
| Output options in docs | Up to 1080p; 1-16 seconds | Up to 1080p; 1-16 seconds |
| Audio support in docs | Q3 audio-video generation controls are available | Q3 audio-video generation controls are available |
| Best first test | High-volume iteration, prompt search, rough cuts, social variants | Final comparison pass after Turbo narrows the prompt and mode |
Turbo and Pro are better viewed as two pricing and workflow options than as a simple good-versus-bad ranking. The public docs and pricing pages support a cost and endpoint comparison, but they do not publish a universal benchmark, latency score, or scene-quality ranking that settles the question for every prompt. If the output really matters, the more reliable way to decide is to run the same prompt or image set through both variants and compare the results side by side.
Vidu Q3 text-to-video, image-to-video, and start-end modes
Vidu Q3 is not a single setup. On Novita AI, the useful choice is two-dimensional: pick Pro or Turbo, then pick the generation mode that matches your source material.
| Mode | What you provide | Use it when | Pro endpoint | Turbo endpoint |
|---|---|---|---|---|
| Text-to-video | A text prompt | You are exploring a new scene, character, camera move, ad concept, or storyboard idea from scratch | /v3/async/vidu-q3-pro-t2v | /v3/async/vidu-q3-turbo-t2v |
| Image-to-video | One reference image plus optional motion prompt | You already have a product image, character frame, style reference, or still composition to animate | /v3/async/vidu-q3-pro-i2v | /v3/async/vidu-q3-turbo-i2v |
| Start-end-to-video | Two images, one start frame and one end frame | You need the model to interpolate between a known first and last frame | /v3/async/vidu-q3-pro-f2v | /v3/async/vidu-q3-turbo-f2v |
For text-to-video, the docs list a required prompt, an audio boolean, duration, resolution, aspect_ratio, off_peak, and watermark controls. Pro text-to-video accepts prompts up to 2,000 characters; Turbo text-to-video accepts prompts up to 5,000 characters.
For image-to-video, the docs require an images array. Pro image-to-video currently supports one image input, with JPG, JPEG, PNG, and WebP accepted, a maximum 50 MB per image, and an aspect ratio between 1:4 and 4:1. The Pro image-to-video docs list audio as a custom audio URL field for background music. Turbo image-to-video also uses a reference image array, supports the same listed image formats and 50 MB limit, and lists an audio boolean plus an audio_type option: all, speech_only, or sound_effect_only.
For start-end-to-video, both Pro and Turbo docs require exactly two images: the first image is the start frame and the second image is the end frame. The docs list 1-16 second duration and 540p, 720p, and 1080p resolution options. Use this mode when you care about where a transition begins and ends more than you care about discovering a scene from a blank prompt.
Vidu Q3 Pro and Turbo pricing
Novita AI pricing is listed per second for Vidu Q3 Pro and Vidu Q3 Turbo. Current public pricing checked on June 23, 2026 shows the same rates across text-to-video, image-to-video, and start-end-to-video for each variant and resolution.
| Resolution | Vidu Q3 Pro peak | Vidu Q3 Pro off-peak | Vidu Q3 Turbo peak | Vidu Q3 Turbo off-peak |
|---|---|---|---|---|
| 540p | $0.0625/s | $0.0313/s | $0.0357/s | $0.0179/s |
| 720p | $0.1339/s | $0.0670/s | $0.0536/s | $0.0268/s |
| 1080p | $0.1429/s | $0.0714/s | $0.0714/s | $0.0357/s |
Here is what that means for common test clips:
| Test clip | Pro peak | Pro off-peak | Turbo peak | Turbo off-peak |
|---|---|---|---|---|
| 5 seconds at 540p | $0.3125 | $0.1565 | $0.1785 | $0.0895 |
| 10 seconds at 720p | $1.3390 | $0.6700 | $0.5360 | $0.2680 |
| 16 seconds at 1080p | $2.2864 | $1.1424 | $1.1424 | $0.5712 |
Off-peak mode makes the most sense when turnaround is flexible. The Vidu Q3 API docs describe off-peak tasks as lower-cost tasks processed within 48 hours, which can work well when you are exploring prompts and want a broader batch of tests at a lower cost. If you are building a user-facing flow, peak mode is still the safer default unless delayed delivery is already part of the product experience.
Which Vidu Q3 mode should you test first?
The easiest way to choose a mode is to start with the input you already have. A lot of disappointing tests come from picking the most exciting option first, instead of the one that best matches the material on hand.
| Situation | First mode to test | Recommended variant | Why |
|---|---|---|---|
| You only have a written idea | Text-to-video | Turbo | It lets you explore more prompt directions at a lower per-second cost. |
| You have a product render or character still | Image-to-video | Turbo first, then Pro for finalists | The reference image constrains the visual target, and Turbo keeps iteration cheaper. |
| You have a storyboard with a known first and last frame | Start-end-to-video | Turbo first, then Pro if needed | The two images give the model explicit endpoints, which is useful for controlled transitions. |
| You need a silent clip for later editing | Text-to-video or image-to-video with audio disabled | Turbo | The docs expose an audio control, so you can avoid generating audio you will replace. |
| You are deciding between final candidate clips | Same mode in both variants | Pro and Turbo side by side | Use identical inputs and compare outputs for your scene instead of relying on generic assumptions. |
If you are new to Vidu Q3 on Novita AI, this is usually the smoothest way to start:
- Run Turbo text-to-video at 540p or 720p to find the prompt direction.
- Move to image-to-video if you need identity, product, or visual-style control from a still image.
- Use start-end-to-video only when you have a real first frame and last frame.
- Re-run your strongest candidate in Pro at the target resolution before deciding whether the higher price is justified for that scene.
That sequence keeps the more expensive comparison step close to the final decision, when you already have a promising direction. It also helps you avoid spending Pro budget on early prompt exploration that you may end up discarding anyway.
Vidu Q3 API endpoints and request flow
All six Vidu Q3 endpoints in this comparison use Novita AI’s v3 asynchronous task pattern. You submit a generation request, receive a task_id, then call the Task Result API with that task_id to retrieve the generated video when the task succeeds.
| Endpoint | Method | Result pattern |
|---|---|---|
/v3/async/vidu-q3-pro-t2v | POST | Returns task_id |
/v3/async/vidu-q3-pro-i2v | POST | Returns task_id |
/v3/async/vidu-q3-pro-f2v | POST | Returns task_id |
/v3/async/vidu-q3-turbo-t2v | POST | Returns task_id |
/v3/async/vidu-q3-turbo-i2v | POST | Returns task_id |
/v3/async/vidu-q3-turbo-f2v | POST | Returns task_id |
/v3/async/task-result | GET | Returns task status and generated media when available |
A minimal Turbo text-to-video request looks like this:
curl --request POST \
--url https://api.novita.ai/v3/async/vidu-q3-turbo-t2v \
--header "Authorization: Bearer $NOVITA_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"prompt": "A close-up product launch video on a clean studio table, soft camera push-in, subtle lighting movement",
"duration": 5,
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": true,
"off_peak": false
}'
Then poll the task result endpoint:
curl --request GET \
--url "https://api.novita.ai/v3/async/task-result?task_id=$NOVITA_TASK_ID" \
--header "Authorization: Bearer $NOVITA_API_KEY"
For image-to-video, replace the endpoint with the I2V endpoint and provide the images array. For start-end-to-video, use the F2V endpoint and provide two images in order: start frame first, end frame second.
Practical Vidu Q3 test plan
Use a small test matrix instead of one-off impressions. The goal is not to prove a universal winner; it is to choose the right variant and mode for your use case.
| Test pass | Variant | Mode | Resolution | What to evaluate |
|---|---|---|---|---|
| Prompt search | Turbo | Text-to-video | 540p or 720p | Which prompt structure gives the right scene, motion, and framing? |
| Reference control | Turbo | Image-to-video | 720p | Does the model preserve the subject or product enough for your use case? |
| Transition control | Turbo | Start-end-to-video | 720p | Does the motion between first and last frame feel usable? |
| Final comparison | Turbo and Pro | Same winning mode | Target resolution | Is the Pro result worth the higher per-second cost for this scene? |
| Cost pass | Winning variant | Same winning mode | Target resolution | Should this run peak, or can it move to off-peak? |
When you compare Pro and Turbo, keep these variables the same:
- Same prompt, image inputs, duration, resolution, and aspect ratio.
- Same audio setting.
- Same off-peak setting when you are comparing output results.
- Same evaluation criteria: identity consistency, motion clarity, camera movement, audio usefulness, and editability.
If you change the prompt and the model variant at the same time, the comparison gets muddy, because you can no longer tell which change actually improved the result.
FAQ
Is Vidu Q3 Turbo cheaper than Vidu Q3 Pro on Novita AI?
Yes. Current Novita AI pricing checked on June 23, 2026 lists Turbo below Pro at 540p, 720p, and 1080p for text-to-video, image-to-video, and start-end-to-video.
Do Vidu Q3 Pro and Turbo support the same modes?
Novita AI docs list separate Pro and Turbo endpoints for text-to-video, image-to-video, and start-end-to-video. Each endpoint returns a task_id and uses the v3 asynchronous task result flow.
Should I use text-to-video or image-to-video first?
Use text-to-video first when you only have an idea or written scene. Use image-to-video first when a reference image matters, such as a product shot, character frame, or fixed visual style.
When should I use start-end-to-video?
Use start-end-to-video when you have two frames and need the model to create the motion between them. It is the most structured of the three modes because the first and last frame are both specified.
Does Vidu Q3 support audio controls?
Yes. The Vidu Q3 docs include audio controls. Text-to-video and start-end-to-video expose an audio boolean. Pro image-to-video lists audio as a custom audio URL field for background music, while Turbo image-to-video lists an audio boolean plus audio_type options for all, speech_only, and sound_effect_only.
Should I run both Vidu Q3 Turbo and Pro for the same prompt?
Run Turbo first when you are exploring prompts, references, durations, and aspect ratios. If one result is close to what you need, rerun the same setup on Pro so the comparison isolates the model variant instead of mixing prompt and input changes.
