How to Access Qwen 3: A Complete Guide

Refer your friends to Novita AI and both of you will earn $10 in LLM API credits—up to $500 in total rewards.

To support the developer community, Qwen2.5-7B, Qwen 3 0.6B, Qwen 3 1.7B, Qwen 3 4B is currently available for free on Novita AI.

Qwen 3 is a versatile and powerful open-source language model family built by Alibaba. With cutting-edge architecture and dual-mode reasoning, it’s designed to serve both edge devices and large-scale enterprise needs. This article explores its capabilities, model types, and how to use it—either locally or through API.

Table Of Contents

What is Qwen 3?
Qwen 3 Series Architecture
Qwen 3 Series Benchmark
How to Access Qwen 3 Locally?
How to Access Qwen 3 via API
Which Methods Are Suitable for You?

What is Qwen 3?

Qwen 3 is Alibaba’s 2025 open-source large language model family, featuring switchable “thinking” and “non-thinking” modes for enhanced reasoning and multilingual performance across 119+ languages. The Qwen 3 model lineup includes:

Dense models:
Mixture-of-Experts (MoE) models:
- Qwen 3 30B A3B
- Qwen 3 235B A22B

Qwen 3 – Shared Features

Open‑source & Commercial‑friendly

Apache 2.0 license, freely available weights for research and business use.

Efficient Transformer Core

Decoder‑only with Grouped‑Query‑Attention for long‑context KV memory savings up to 128 K tokens.

Dual “Thinking / Non‑thinking” Modes

Detailed chain‑of‑thought when needed, snappy direct answers when speed matters.

Massive 36 T‑token Corpus

119 languages with expanded STEM & code data for stronger reasoning and programming skills.

Three‑Stage Pre‑training

Base skills → STEM enrichment → 32 K‑token long‑context adaptation.

Four‑Stage Post‑training

Long CoT SFT → reasoning RL → mode fusion → general RLHF alignment.

Multilingual Instruction Following

Strong in English & Chinese, robust across 100+ languages for global applications.

Tool / Agent Readiness

Built‑in function‑calling schema to decide and format external tool invocations.

Text‑in / Text‑out Modality

Optimized for language tasks today; vision variants planned for future releases.

Qwen 3 Series Architecture

Qwen 3 Series Benchmark

High-parameter models like Qwen-23B and Qwen-14B consistently follow the rules, with larger models and reasoning-enabled versions scoring higher. These discrepancies in low-parameter models may stem from limitations in their reasoning capabilities, as they lack the capacity to fully leverage reasoning mechanisms, leading to suboptimal performance.

How to Access Qwen 3 Locally?

Hardware Requirements

Model	Recommended GPU	VRAM	vCPUs	RAM	Storage
Qwen3-0.6B	RTX 3060 / T4	8 GB	4	8 GB	20 GB
Qwen3-1.7B	RTX 3060 / A5000	12–24 GB	6–8	16 GB	30 GB
Qwen3-4B	A100 40GB / RTX 3090	24–40 GB	12+	24 GB	40 GB
Qwen3-8B	A100 80GB / H100	40–80 GB	16+	48 GB	60 GB
Qwen3-14B	2× A100 80GB / 1× H100	80 GB+	24+	64 GB	80 GB
Qwen3-30B (MoE)	2× H100 / 4× A100	160 GB	48+	128 GB	160 GB
Qwen3-32B	2× H100 / 4× A100	160 GB	64	160 GB	200 GB
Qwen3-235B (MoE)	8× H100 / 8× A100	640 GB	128+	512 GB	500+ GB

Step-by-Step Installation Guide

# Step 1: Install Python and Create a Virtual Environment
# Ensure Python (>=3.8) is installed. Then create and activate a virtual environment.
python3 -m venv llama_env
source llama_env/bin/activate  # On Windows, use `llama_env\Scripts\activate`

# Step 2: Install Required Libraries
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # For GPU optimization
pip install bitsandbytes  # Efficient GPU memory utilization

# Step 3: Install the Hugging Face CLI and Log In
pip install huggingface-cli
huggingface-cli login  # Follow the prompts to authenticate

# Step 4: Request Access to Llama-3.3 70B
# Visit the Hugging Face model page for Llama-3.3 70B and request access.
# URL: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

# Step 5: Download the Model Files
huggingface-cli download meta-llama/Llama-3.3-70B-Instruct --include "original/*" --local-dir Llama-3.3-70B-Instruct

# Step 6: Load the Model Locally
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Model ID and local directory path
model_id = "meta-llama/Llama-3.3-70B-Instruct"
local_model_dir = "./Llama-3.3-70B-Instruct"

# Load the model with GPU optimization
model = AutoModelForCausalLM.from_pretrained(
    local_model_dir,
    device_map="auto",          # Automatically map model layers to GPU(s)
    torch_dtype=torch.bfloat16  # Use bfloat16 for efficient memory usage
)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(local_model_dir)

# Step 7: Run Inference
# Define input text
input_text = "Explain the theory of relativity in simple terms."

# Tokenize the input
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")  # Send inputs to GPU

# Generate a response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=100,  # Set maximum response length
        temperature=0.7,  # Adjust creativity (lower = less creative, higher = more creative)
        top_k=50,         # Top-k sampling for diversity
    )

# Decode the output tokens
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Response:", response)

How to Access Qwen 3 via API

Novita AI offers an affordable, reliable, and simple inference platform with scalable Llama 3.3 70b API, empowering developers to build AI applications. Try the Novita AI Llama 3.3 70b API Demo today!

Option 1: Direct API Integration (Python Example)

Try Qwen3 at very low price Now!

Key Features:

Unified endpoint:/v3/openai supports OpenAI’s Chat Completions API format.
Flexible controls: Adjust temperature, top-p, penalties, and more for tailored results.
Streaming & batching: Choose your preferred response mode.

Option 2: Multi-Agent Workflows with OpenAI Agents SDK

Build advanced multi-agent systems by integrating Novita AI with the OpenAI Agents SDK:

Plug-and-play: Use Novita AI’s LLMs in any OpenAI Agents workflow.
Supports handoffs, routing, and tool use: Design agents that can delegate, triage, or run functions, all powered by Novita AI’s models.
Python integration: Simply point the SDK to Novita’s endpoint (https://api.novita.ai/v3/openai) and use your API key.

Connect Qwen 3 API on Third-Party Platforms

Hugging Face: Use Qwen 3 in Spaces, pipelines, or with the Transformers library via Novita AI endpoints.

Agent & Orchestration Frameworks: Easily connect Novita AI with partner platforms like Continue, AnythingLLM,LangChain, Dify and Langflow through official connectors and step-by-step integration guides.
OpenAI-Compatible API: Enjoy hassle-free migration and integration with tools such as Cline and Cursor, designed for the OpenAI API standard.

Which Methods Are Suitable for You?

Comparison of Local vs. API Access

Aspect	Local Access	API Access
Scalability	Limited; requires manual upgrades.	Scales automatically and efficiently.
Flexibility	High flexibility; full control over settings.	Less flexible; depends on provider’s configurations.
Usability	Requires technical expertise.	Easier to use, no complex setup needed.
Affordability	High initial cost, low ongoing costs. Best for long-term use.	Pay-per-use, ideal for small-scale or occasional use.

Recommendations for Different User Groups

Researchers → Prefer local access for full control and experiment flexibility.
Developers → Use API for fast testing and building apps; go local for custom training.
Businesses → API is great for easy integration; local suits teams with stable needs.
Small Teams & Individuals → API is more budget-friendly and easier to start with.
Non-technical Users → Definitely go with API—no complex setup required.

Whether you’re a researcher, developer, or business team, Qwen 3 adapts to your needs. Local access provides control and customization, while APIs offer instant scalability and low-barrier entry. Qwen 3’s design ensures strong multilingual, reasoning, and tool-augmented capabilities for real-world tasks.

Frequently Asked Questions

What makes Qwen 3 different from other LLMs?

It supports dual thinking modes, strong multilingual instruction, and long context (128k tokens), with open weights and commercial-friendly licensing.

Can I run Qwen 3 on my PC?

Only the smallest models (e.g., 0.6B) are suitable for consumer GPUs. Larger models require A100/H100 setups.

Is API access easier?

Yes! Novita AI and Hugging Face offer low-cost, plug-and-play Qwen 3 APIs—perfect for quick integration and low-latency use.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

How to Access Qwen 3 Locally or via API: A Complete Guide