Get Instant Access to Free Qwen, LLaMA, GLM, and BGE Models

To support the open-source AI community and accelerate innovation in natural language processing, Novita AI has made five powerful models freely accessible via API. These include the compact yet capable Llama 3.2 1B Instruct, the versatile Qwen2.5-7B Instruct, the high-performing GLM-4-9B-0414 and GLM-Z1-9B-0414, as well as the multilingual and multi-functional embedding model BGE-M3. By offering open access to these models, Novita AI aims to empower developers, researchers, and startups to build, test, and scale AI applications more efficiently—without the burden of high infrastructure costs.

Table Of Contents

Llama 3.2 1b instruct
Qwen2.5-7b-instruct
GLM-4-9b-0414 and GLM-Z1-9b-0414
bge-m3
How to Access Free model on Novita AI?

Llama 3.2 1b instruct

Try Llama 3.2 1B Now!

Model Size: 1.23B parameters
Architecture: Optimized Transformer with Grouped-Query Attention (GQA), SwiGLU activation, Rotary Positional Embeddings (RoPE), and RMSNorm
Context Length: 128K tokens
Multilingual: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai; trained on a broader set of languages
Modality: Text-to-text (input and output)
Training Data: Trained on up to 9 trillion tokens from publicly available online data
Open Source: ✅
Benchmark: Demonstrates strong performance across tasks such as instruction following, summarization, prompt rewriting, and tool use; competitive with other models in its parameter class

Qwen2.5-7b-instruct

Try Qwen 2.5 7B Now!

Qwen 2.5 7B is a multilingual, open-source transformer model with strong performance across general, mathematical, coding, and multilingual tasks. It’s built for versatility, lightweight deployment, and broad language support.

Model Size: 7.61B parameters
Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
Context Length: 128K tokens
Multilingual: Supports over 29 languages
Modality: Text-to-text
Training Data: Trained on 18T+ tokens
Open Source: ✅
Benchmark: Qwen 2.5 7B consistently ranks #1 across all categories in this benchmark suite—general tasks, STEM, coding, and multilingual understanding—despite having a relatively compact parameter count.

GLM-4-9b-0414 and GLM-Z1-9b-0414

Try GLM-4 9B-0414 Now!

GLM-4-9B-0414 and GLM-Z1-9B-0414 are two 9-billion-parameter open-source language models developed by THUDM, each optimized for distinct tasks.

GLM-4-9B-0414: Designed for dialogue generation, this model inherits the architecture of GLM-4-32B and excels in tasks such as multi-turn conversations, translation, and summarization. It supports a 32K context window and is suitable for resource-constrained deployments requiring robust language understanding and generation capabilities.
GLM-Z1-9B-0414: Focused on mathematical reasoning and general tasks, this model incorporates techniques like extended reinforcement learning and pairwise ranking alignment. It demonstrates strong performance in mathematics, code, and logic tasks, outperforming many open-source models in its weight class.

Feature	Value
Model Size	9B parameters
Strengths	– GLM-4-9B-0414: High performance-to-size ratio, excels at Math & Reasoning – GLM-Z1-9B-0414: Strong performance on Math and General Tasks
Task Orientation	– GLM-4-9B-0414: Chat-oriented – GLM-Z1-9B-0414: Reasoning-focused
Modalities	Text-to-Text with HTML/SVG visualization support
Context Window	32K tokens
Training & Alignment	Distilled from GLM-4-32B. The base model was pre-trained on 15 trillion tokens of high-quality data (especially synthetic reasoning data) and aligned through human preference tuning for dialogue tasks.

bge-m3

Try bge m3 Now!

BGE-M3 is a cutting-edge text embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI). It is engineered for exceptional versatility, offering strong performance across three core dimensions: functionality, language support, and input granularity. BGE-M3 achieves state-of-the-art results on multiple benchmarks, including MKQA and MLDR, consistently outperforming competing models in both monolingual and cross-lingual retrieval scenarios.

Multi-Functionality: BGE-M3 seamlessly integrates three retrieval strategies within a unified architecture:
- Dense Retrieval – Generates a single vector representation per input, ideal for general semantic matching.
- Sparse Retrieval – Emphasizes token-level importance, similar to traditional lexical matching.
- Multi-Vector Retrieval – Produces multiple vectors per input to capture fine-grained semantics and boost retrieval precision.
Multi-Linguality: Supports over 100 languages, enabling both multilingual and cross-lingual retrieval capabilities.
Multi-Granularity: Designed to handle a wide range of input lengths—from short phrases to long-form documents—supporting up to 8192 tokens per input.

How to Access Free model on Novita AI?

Step 1: Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 5: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "model name"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Whether you’re building an intelligent chatbot, semantic search engine, or multilingual recommendation system, free access to Novita AI’s models offers everything you need to get started fast. With world-class performance and easy API integration, these models make scalable AI more accessible than ever.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Qwen,Llama,GLM,bge are Free on Novita AI – Try Them Now!

Llama 3.2 1b instruct

Qwen2.5-7b-instruct

GLM-4-9b-0414 and GLM-Z1-9b-0414

bge-m3

How to Access Free model on Novita AI?