How to Access Qwen2.5-VL-72B Model Easy Guide

Key Highlights

Multimodal + Long Video Understanding
Supports images, documents, and long videos
Suitable for education, media, surveillance

Accurate Localization + Structured Output
Detects objects precisely
Extracts structured data from invoices, forms, charts
Useful in finance, law, logistics

You can start a free trail on Novita AI API in a very convenient way!

Qwen2.5-VL-72B-Instruct is a powerful 72B-parameter large vision-language model (LVLM) fine-tuned for instruction-following tasks. It supports both textual and visual inputs (images & videos), making it ideal for multimodal reasoning, document understanding, video analysis, and agentic interaction.

Table Of Contents

What is Qwen2.5-VL-72B?
How to Access Qwen2.5-VL-72B Locally?
How to Access Qwen2.5-VL-72B via Novita API?
Using Qwen2.5-VL-72B via Cloud GPU
Frequently Asked Questions
Simple APIs and Scalable GPU

What is Qwen2.5-VL-72B?

An Example to Show Qwen2.5-VL-72B’s Ability

Input:Give the query: ’the user is experiencing the image generation feature’, when does the described content occur in the video? Use seconds for time format.

Output: The described content occurs from 28 seconds to 50 seconds in the video. During this segment, the user interacts with the image generation feature, requesting and receiving an artistic double scene painting of a mountain during day and night. The user then adds a bird to the generated image, demonstrating the functionality of the image generation tool.

Qwen2.5-VL-72B Overview

Category	Item	Details
Basic Info	Release Date	January 28, 2025
	Model Size	73.4B parameters
	Open Source	Yes (released by Qwen)
Architecture	Core Components	Dynamic Resolution & Frame Rate Training\SwiGLU + RMSNorm + Window Attention\Dynamic FPS Sampling
Language Support	Supported Languages	Excels in multilingual documents and scene text recognition
Multimodal	Capability	Visual (images & videos) and textual inputs
Context	Context Window	Configurable up to 64K tokens for long videos
Precision	Tensor Type	BF16
Benchmarks	MMMU (Image)	70.2 (Qwen2.5-VL-72B) vs 70.3 (GPT-4o)
	MVBench (Video)	70.4 (Qwen2.5-VL-72B) vs 64.6 (GPT-4o)
	AITZ_EM (Agent)	83.2 (Qwen2.5-VL-72B) vs 35.3 (GPT-4o)

How to Access Qwen2.5-VL-72B Locally?

Qwen2.5-VL-72B Hardware Requirements

Category	Item	Details
Hardware	Nvidia A100 (80 GB)	8 GPUs × 80 GB = 640 GB Total VRAM
	Nvidia H100 (80 GB)	8 GPUs × 80 GB = 640 GB Total VRAM
	RTX 4090 (24 GB)	24 GPUs × 24 GB = 576 GB Total VRAM
	Nvidia L40S (48 GB)	8 GPUs × 48 GB = 384 GB Total VRAM

Install Qwen2.5-VL-72B locally

1.Install Dependencies

bashCopyEdit<code># Install the latest Hugging Face Transformers from source (required for Qwen2.5-VL)<br>pip install git+https://github.com/huggingface/transformers accelerate<br><br># Install the vision utility toolkit (recommended with decord for fast video loading)<br>pip install 'qwen-vl-utils[decord]==0.0.8'</code>

2.Using Qwen2.5-VL for Visual Question Answering

import torch
from transformers import AutoTokenizer, AutoModelForVision2Seq
from qwen_vl_utils import load_image, load_video, build_multimodal_inputs

# 🔧 Model name (can also use a local path)
model_name = "Qwen/Qwen2.5-VL-7B-Instruct"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(model_name, trust_remote_code=True).eval()

#Load an image (can be local path, URL, or base64)
image = load_image("https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg")

#Define the query
query = "What is happening in the image?"

#Build inputs for the model
inputs = build_multimodal_inputs(tokenizer, query=query, images=[image])

#Inference
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128)

#Decode and print response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Answer:", response)

3.Video Input Example

video = load_video("path_or_url_to_video.mp4")
query = "Summarize the video content."

inputs = build_multimodal_inputs(tokenizer, query=query, videos=)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=128)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Answer:", response)

How to Access Qwen2.5-VL-72B via Novita API?

Step 1: Log In and Access the Model Library

Try Qwen2-VL-72B-Instruct Demo Now!

Step 2: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 4: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "qwen/qwen2.5-vl-72b-instruct"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Using Qwen2.5-VL-72B via Cloud GPU

Step1：Register an account

If you’re new to Novita AI, begin by creating an account on our website. Once you’re registered, head to the “GPUs” tab to explore available resources and start your journey.

Step2：Exploring Templates and GPU Servers

Start by selecting a template that matches your project needs, such as PyTorch, TensorFlow, or CUDA. Choose the version that fits your requirements, like PyTorch 2.2.1 or CUDA 11.8.0. Then, select the A100 GPU server configuration, which offers powerful performance to handle demanding workloads with ample VRAM, RAM, and disk capacity.

novita ai website screenshot using cloud gpu

Try Novita AI’s High-Performance GPUs

Step3：Tailor Your Deployment

After selecting a template and GPU, customize your deployment settings by adjusting parameters like the operating system version (e.g., CUDA 11.8). You can also tweak other configurations to tailor the environment to your project’s specific requirements.

Step4：Launch an instance

Once you’ve finalized the template and deployment settings, click “Launch Instance” to set up your GPU instance. This will start the environment setup, enabling you to begin using the GPU resources for your AI tasks.

Qwen2.5-VL-72B-Instruct delivers cutting-edge performance across a wide range of vision-language tasks. Whether you’re automating workflows in finance or analyzing videos in real time, it combines depth, scale, and flexibility. With open-source access and multiple deployment paths—local GPU, cloud instances, or API—Qwen2.5-VL empowers developers and enterprises to build smarter, more capable AI systems.

Frequently Asked Questions

Can I deploy Qwen2.5-VL-72B-Instruct locally?

Yes. You can run it on machines with sufficient VRAM (e.g., 8×A100 or 24×4090 GPUs).

How do I use Qwen2.5-VL-72B-Instruct via API?

You can access Qwen2.5-VL-72B-Instruct via Novita AI’s Model Library, start a free trial, and get an API key for fast integration.

What is Qwen2.5-VL-72B vs Qwen2.5-VL-72B-Instruct?

The base model handles general visual-language tasks; the “Instruct” version is fine-tuned to follow user instructions more accurately.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

Simple APIs and Scalable GPU

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing an affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Tutorial: How to Access Qwen2.5-VL-72B Locally, via API, on Cloud GPU

Key Highlights

What is Qwen2.5-VL-72B?

An Example to Show Qwen2.5-VL-72B’s Ability

Qwen2.5-VL-72B Overview

How to Access Qwen2.5-VL-72B Locally?

Qwen2.5-VL-72B Hardware Requirements

Install Qwen2.5-VL-72B locally

1.Install Dependencies

2.Using Qwen2.5-VL for Visual Question Answering

3.Video Input Example

How to Access Qwen2.5-VL-72B via Novita API?

Step 1: Log In and Access the Model Library

Step 2: Start Your Free Trial

Step 3: Get Your API Key

Step 4: Install the API

Using Qwen2.5-VL-72B via Cloud GPU

Step1：Register an account

Step2：Exploring Templates and GPU Servers

Step3：Tailor Your Deployment

Step4：Launch an instance

Frequently Asked Questions

Simple APIs and Scalable GPU

Discover more from Novita

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Key Highlights

What is Qwen2.5-VL-72B?

An Example to Show Qwen2.5-VL-72B’s Ability

Qwen2.5-VL-72B Overview

How to Access Qwen2.5-VL-72B Locally?

Qwen2.5-VL-72B Hardware Requirements

Install Qwen2.5-VL-72B locally

1.Install Dependencies

2.Using Qwen2.5-VL for Visual Question Answering

3.Video Input Example

How to Access Qwen2.5-VL-72B via Novita API?

Step 1: Log In and Access the Model Library

Step 2: Start Your Free Trial

Step 3: Get Your API Key

Step 4: Install the API

Using Qwen2.5-VL-72B via Cloud GPU

Step1：Register an account

Step2：Exploring Templates and GPU Servers

Step3：Tailor Your Deployment

Step4：Launch an instance

Frequently Asked Questions

Recommend Reading

Simple APIs and Scalable GPU

Discover more from Novita

Related Posts

Leave a CommentCancel reply

CONTACT

RESOURCES

COMPANY

PARTNERS

Discover more from Novita