Qwen,Llama,GLM,bge are Free on Novita AI – Try Them Now!

5 FREE LLM Model on Novita AI

To support the open-source AI community and accelerate innovation in natural language processing, Novita AI has made five powerful models freely accessible via API. These include the compact yet capable Llama 3.2 1B Instruct, the versatile Qwen2.5-7B Instruct, the high-performing GLM-4-9B-0414 and GLM-Z1-9B-0414, as well as the multilingual and multi-functional embedding model BGE-M3. By offering open access to these models, Novita AI aims to empower developers, researchers, and startups to build, test, and scale AI applications more efficiently—without the burden of high infrastructure costs.

Llama 3.2 1b instruct

Llama-3.2-1b-instruct
  • Model Size: 1.23B parameters
  • Architecture: Optimized Transformer with Grouped-Query Attention (GQA), SwiGLU activation, Rotary Positional Embeddings (RoPE), and RMSNorm
  • Context Length: 128K tokens
  • Multilingual: Officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai; trained on a broader set of languages
  • Modality: Text-to-text (input and output)
  • Training Data: Trained on up to 9 trillion tokens from publicly available online data
  • Open Source: ✅
  • Benchmark: Demonstrates strong performance across tasks such as instruction following, summarization, prompt rewriting, and tool use; competitive with other models in its parameter class

Qwen2.5-7b-instruct

Qwen 2.5 7B

Qwen 2.5 7B is a multilingual, open-source transformer model with strong performance across general, mathematical, coding, and multilingual tasks. It’s built for versatility, lightweight deployment, and broad language support.

  • Model Size: 7.61B parameters
  • Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Context Length: 128K tokens
  • Multilingual: Supports over 29 languages
  • Modality: Text-to-text
  • Training Data: Trained on 18T+ tokens
  • Open Source: ✅
  • Benchmark: Qwen 2.5 7B consistently ranks #1 across all categories in this benchmark suite—general tasks, STEM, coding, and multilingual understanding—despite having a relatively compact parameter count.

GLM-4-9b-0414 and GLM-Z1-9b-0414

GLM-4-9b-0414 and GLM-Z1-9b-0414

​GLM-4-9B-0414 and GLM-Z1-9B-0414 are two 9-billion-parameter open-source language models developed by THUDM, each optimized for distinct tasks.​

  • GLM-4-9B-0414: Designed for dialogue generation, this model inherits the architecture of GLM-4-32B and excels in tasks such as multi-turn conversations, translation, and summarization. It supports a 32K context window and is suitable for resource-constrained deployments requiring robust language understanding and generation capabilities. ​
  • GLM-Z1-9B-0414: Focused on mathematical reasoning and general tasks, this model incorporates techniques like extended reinforcement learning and pairwise ranking alignment. It demonstrates strong performance in mathematics, code, and logic tasks, outperforming many open-source models in its weight class.
FeatureValue
Model Size9B parameters
StrengthsGLM-4-9B-0414: High performance-to-size ratio, excels at Math & Reasoning
GLM-Z1-9B-0414: Strong performance on Math and General Tasks
Task OrientationGLM-4-9B-0414: Chat-oriented
GLM-Z1-9B-0414: Reasoning-focused
ModalitiesText-to-Text with HTML/SVG visualization support
Context Window32K tokens
Training & AlignmentDistilled from GLM-4-32B. The base model was pre-trained on 15 trillion tokens of high-quality data (especially synthetic reasoning data) and aligned through human preference tuning for dialogue tasks.

bge-m3

bge m3

BGE-M3 is a cutting-edge text embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI). It is engineered for exceptional versatility, offering strong performance across three core dimensions: functionality, language support, and input granularity. BGE-M3 achieves state-of-the-art results on multiple benchmarks, including MKQA and MLDR, consistently outperforming competing models in both monolingual and cross-lingual retrieval scenarios.

  • Multi-Functionality: BGE-M3 seamlessly integrates three retrieval strategies within a unified architecture:
    • Dense Retrieval – Generates a single vector representation per input, ideal for general semantic matching.
    • Sparse Retrieval – Emphasizes token-level importance, similar to traditional lexical matching.
    • Multi-Vector Retrieval – Produces multiple vectors per input to capture fine-grained semantics and boost retrieval precision.
  • Multi-Linguality: Supports over 100 languages, enabling both multilingual and cross-lingual retrieval capabilities.
  • Multi-Granularity: Designed to handle a wide range of input lengths—from short phrases to long-form documents—supporting up to 8192 tokens per input.

How to Access Free model on Novita AI?

Step 1: Log In and Access the Model Library

Log in to your account and click on the Model Library button.

Log In and Access the Model Library

Step 2: Choose Your Model

Browse through the available options and select the model that suits your needs.

choose your model

Step 3: Start Your Free Trial

Begin your free trial to explore the capabilities of the selected model.

start your free trail

Step 4: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

get api key

Step 5: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI LLM. This is an example of using chat completions API for python users.

from openai import OpenAI
  
from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "model name"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  
  

Whether you’re building an intelligent chatbot, semantic search engine, or multilingual recommendation system, free access to Novita AI’s models offers everything you need to get started fast. With world-class performance and easy API integration, these models make scalable AI more accessible than ever.

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.


Discover more from Novita

Subscribe to get the latest posts sent to your email.

Leave a Comment

Scroll to Top

Discover more from Novita

Subscribe now to keep reading and get access to the full archive.

Continue reading