Qwen3, an AI model family, is designed for developers seeking cutting-edge capabilities in reasoning, multilingual support, and lightweight efficiency. With free access on Novita AI’s platform and seamless API integration, Qwen3 enables dynamic applications, from coding assistance to complex problem-solving.
Qwen3 models introduce a hybrid problem-solving approach with two modes:
Thinking Mode: For complex problems, the model reasons step by step, delivering thoughtful answers.
Non-Thinking Mode: For simpler tasks, the model provides fast, near-instant responses.
This flexibility lets users control the model’s reasoning effort based on task requirements. Harder problems benefit from extended reasoning, while simpler ones are solved quickly.
By combining these modes, Qwen3 achieves stable and efficient thinking budget control, offering scalable performance improvements tied to allocated computational reasoning budgets. This design makes task-specific budgeting easier, balancing cost efficiency and inference quality.
Multilingual Support
Qwen3 models support 119 languages and dialects, unlocking new possibilities for global applications. Optimized for coding, agentic capabilities, and MCP, Qwen3 enables users worldwide to leverage its power effectively.
Improved Agentic Capabilities
Qwen3 is optimized for coding and agentic capabilities, with enhanced support for MCP. Below are examples demonstrating how Qwen3 thinks and interacts with its environment.
Qwen 3 Small Models
Tie Embedding is a technique commonly used in natural language processing (NLP) models to share weights between different embedding layers. Specifically, it refers to tying (or sharing) the weights of the input embedding layer and the output embedding layer in a neural network, particularly in language models like transformers.
Training Methods of Qwen 3 Small Models
From the diagram, we can infer that Qwen 3 0.6B,1.7B,4B were trained through a Strong-to-Weak Distillation process, which is part of the pipeline for creating Lightweight Models. Here’s a step-by-step breakdown of the training process:
Base Models: The process begins with pre-trained Base Models, which act as the foundation for subsequent training and distillation.
Frontier Models:
Base Models are first trained through a multi-stage process to create Frontier Models like Qwen3-235B-A22B and Qwen3-32B.
This training involves:
Stage 1 (Long-CoT Cold Start): Initial training with long chain-of-thought (CoT) reasoning.
Stage 3 (Thinking Mode Fusion): Integration of Thinking Modes (e.g., reasoning and quick-response modes).
Stage 4 (General RL): General reinforcement learning for broader capabilities.
Strong-to-Weak Distillation:
The large Frontier Models (e.g., Qwen3-235B and Qwen3-32B) are then used as teacher models to guide the training of Lightweight Models like Qwen3-4B.
This distillation process ensures that the smaller models retain the knowledge and performance of the larger models while significantly reducing size and computational requirements.
Qwen3-4B:
As a result of this distillation process, Qwen 3 0.6B,1.7B,4B are a lightweight version, benefiting from the knowledge of the larger models while being optimized for efficiency.
How to Access Qwen 3 Small Models via Novita API?
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.
Step 2: Choose Your Model
Browse through the available options and select the model that suits your needs.
Step 3: Start Your Free Trial
Begin your free trial to explore the capabilities of the selected model.
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.
Step 5: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="<YOUR Novita AI API Key>",
)
model = "qwen3-0.6b-fp8"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Qwen3 offers unparalleled versatility with its hybrid thinking modes, multilingual capabilities, and lightweight efficiency. Whether you’re solving complex problems or building global applications, Qwen3 empowers you to achieve more. Start your journey today with Novita AI’s free access and explore the future of AI-powered development.
Log in to Novita AI, select a model, get your API key, and integrate it into your project with the provided documentation.
Are Qwen3 models free to use?
Yes! Novita AI offers free access to Qwen3 models with easy API integration.
Novita AIis an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.