
To support the open-source AI community and accelerate innovation in natural language processing, Novita AI has made five powerful models freely accessible via API. These include the compact yet capable Llama 3.2 1B Instruct, the versatile Qwen2.5-7B Instruct, the high-performing GLM-4-9B-0414 and GLM-Z1-9B-0414, as well as the multilingual and multi-functional embedding model BGE-M3. By offering open access to these models, Novita AI aims to empower developers, researchers, and startups to build, test, and scale AI applications more efficiently—without the burden of high infrastructure costs.
Llama 3.2 1b instruct

- Model Size: 1.23B parameters
- Architecture: Optimized Transformer with Grouped-Query Attention (GQA), SwiGLU activation, Rotary Positional Embeddings (RoPE), and RMSNorm
- Context Length: 128K tokens
- Multilingual: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai; trained on a broader set of languages
- Modality: Text-to-text (input and output)
- Training Data: Trained on up to 9 trillion tokens from publicly available online data
- Open Source: ✅
- Benchmark: Demonstrates strong performance across tasks such as instruction following, summarization, prompt rewriting, and tool use; competitive with other models in its parameter class
Qwen2.5-7b-instruct

Qwen 2.5 7B is a multilingual, open-source transformer model with strong performance across general, mathematical, coding, and multilingual tasks. It’s built for versatility, lightweight deployment, and broad language support.
- Model Size: 7.61B parameters
- Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- Context Length: 128K tokens
- Multilingual: Supports over 29 languages
- Modality: Text-to-text
- Training Data: Trained on 18T+ tokens
- Open Source: ✅
- Benchmark: Qwen 2.5 7B consistently ranks #1 across all categories in this benchmark suite—general tasks, STEM, coding, and multilingual understanding—despite having a relatively compact parameter count.
GLM-4-9b-0414 and GLM-Z1-9b-0414

GLM-4-9B-0414 and GLM-Z1-9B-0414 are two 9-billion-parameter open-source language models developed by THUDM, each optimized for distinct tasks.
- GLM-4-9B-0414: Designed for dialogue generation, this model inherits the architecture of GLM-4-32B and excels in tasks such as multi-turn conversations, translation, and summarization. It supports a 32K context window and is suitable for resource-constrained deployments requiring robust language understanding and generation capabilities.
- GLM-Z1-9B-0414: Focused on mathematical reasoning and general tasks, this model incorporates techniques like extended reinforcement learning and pairwise ranking alignment. It demonstrates strong performance in mathematics, code, and logic tasks, outperforming many open-source models in its weight class.
| Feature | Value |
|---|---|
| Model Size | 9B parameters |
| Strengths | – GLM-4-9B-0414: High performance-to-size ratio, excels at Math & Reasoning – GLM-Z1-9B-0414: Strong performance on Math and General Tasks |
| Task Orientation | – GLM-4-9B-0414: Chat-oriented – GLM-Z1-9B-0414: Reasoning-focused |
| Modalities | Text-to-Text with HTML/SVG visualization support |
| Context Window | 32K tokens |
| Training & Alignment | Distilled from GLM-4-32B. The base model was pre-trained on 15 trillion tokens of high-quality data (especially synthetic reasoning data) and aligned through human preference tuning for dialogue tasks. |
bge-m3

BGE-M3 is a cutting-edge text embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI). It is engineered for exceptional versatility, offering strong performance across three core dimensions: functionality, language support, and input granularity. BGE-M3 achieves state-of-the-art results on multiple benchmarks, including MKQA and MLDR, consistently outperforming competing models in both monolingual and cross-lingual retrieval scenarios.
- Multi-Functionality: BGE-M3 seamlessly integrates three retrieval strategies within a unified architecture:
- Dense Retrieval – Generates a single vector representation per input, ideal for general semantic matching.
- Sparse Retrieval – Emphasizes token-level importance, similar to traditional lexical matching.
- Multi-Vector Retrieval – Produces multiple vectors per input to capture fine-grained semantics and boost retrieval precision.
- Multi-Linguality: Supports over 100 languages, enabling both multilingual and cross-lingual retrieval capabilities.
- Multi-Granularity: Designed to handle a wide range of input lengths—from short phrases to long-form documents—supporting up to 8192 tokens per input.
How to Access Free model on Novita AI?
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.

Step 2: Choose Your Model
Browse through the available options and select the model that suits your needs.

Step 3: Start Your Free Trial
Begin your free trial to explore the capabilities of the selected model.

Step 4: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 5: Install the API
Install API using the package manager specific to your programming language.
After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.
from openai import OpenAI
from openai import OpenAI
client = OpenAI(
base_url="https://api.novita.ai/v3/openai",
api_key="<YOUR Novita AI API Key>",
)
model = "model name"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": system_content,
},
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
presence_penalty=presence_penalty,
frequency_penalty=frequency_penalty,
response_format=response_format,
extra_body={
"top_k": top_k,
"repetition_penalty": repetition_penalty,
"min_p": min_p
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
Whether you’re building an intelligent chatbot, semantic search engine, or multilingual recommendation system, free access to Novita AI’s models offers everything you need to get started fast. With world-class performance and easy API integration, these models make scalable AI more accessible than ever.
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
Recommended Reading
- Tutorial: How to Access Gemma 3 27B Locally, via API, on Cloud GPU
- Novita AI Now Supports OpenAI Agents SDK
- Llama 3.2 3B vs DeepSeek V3: Comparing Efficiency and Performance.
Discover more from Novita
Subscribe to get the latest posts sent to your email.





