Qwen3 Embedding 8B Now Available on Novita AI: Next Level Search

Table Of Contents

What is Embedding Model?
What is Qwen 3 Embedding Model?
How to Access Qwen 3 Embedding 8B?

Traditional search engines rely heavily on keyword matching, often missing the true intent or context behind user queries. Embedding models revolutionize this process by transforming queries and documents into dense vectors that capture deep semantic meaning. This enables highly relevant and context-aware retrieval—even when exact keywords don’t match.

Now available on Novita AI, the powerful Qwen3 Embedding 8B model supports long inputs, multilingual understanding, and instruction-aware customization, setting a new standard for search, recommendation, and knowledge management applications.

Building with Novita AI Today!

What is Embedding Model?

An embedding model is a machine learning technique that transforms complex, high-dimensional data—such as words, images, or audio—into lower-dimensional numerical vectors. These vectors capture the semantic relationships between data points, enabling models to process and analyze data more effectively.

From Qdrant

What Problems Embedding Models Solved?

Embeddings enable effective semantic search by representing queries and documents as vectors, solving the problem of retrieving the most relevant data for improved generation in RAG systems.

From Blog

1. E-Commerce Search and Recommendations
Platforms like Instacart and Taobao utilize embedding-based retrieval systems to improve product search and recommendations. By understanding the semantic relationships between products and user queries, these systems can suggest items that align with user preferences, even when specific keywords aren’t used.

2. Content Discovery
Media platforms employ embeddings to recommend articles, videos, or music based on user behavior and preferences. For instance, if a user reads an article about space exploration, the system might suggest related content on astronomy or rocket technology .

3. Enterprise Knowledge Management
Organizations use embedding models to facilitate semantic search within internal documents and knowledge bases. This enables employees to find relevant information efficiently, even when their search terms don’t exactly match the document content .

4. Customer Support Chatbots
Embedding models enhance chatbots by enabling them to understand and respond to user queries more effectively. By mapping user questions to semantically similar answers in a knowledge base, chatbots can provide accurate and contextually appropriate responses .

Evaluating Embedding Models

These metrics and task categories are integral to evaluating and comparing the performance of embedding models across diverse applications. By analyzing a model’s performance in these areas, researchers and practitioners can select the most suitable models for their specific needs.

Mean (Task): Average model performance across all evaluated tasks, showing overall versatility.
Mean (Type): Average performance across different task types or data formats (e.g., sentence-to-sentence, paragraph-to-paragraph).
Bitext Mining: Finding sentence pairs in different languages that are translations, important for multilingual corpora.
Class. (Classification): Assigning predefined labels to texts, like sentiment or topic classification.
Clust. (Clustering): Grouping similar texts without labels to discover topics or structures.
Inst. Retri. (Instance Retrieval): Retrieving specific relevant documents for a query, used in search or recommendations.
Multi. Class. (Multiclass Classification): Classifying inputs into 3+ categories, e.g., news topics.
Pair. Class. (Pair Classification): Determining relationships between text pairs, like duplicates or paraphrases.
Rerank: Reordering candidate lists (e.g., search results) to improve relevance.
Retri. (Retrieval): Fetching relevant documents/passages from large corpora based on queries.

What is Qwen 3 Embedding Model?

Model	Size	Layers	Sequence Length	Embedding Dimension	MRL Support	Instruct Aware
Qwen3 Embedding 0.6B	0.6B	28	32K	1024	Yes	Yes
Qwen3 Embedding 4B	4B	36	32K	2560	Yes	Yes
Qwen3 Embedding 8B	8B	36	32K	4096	Yes	Yes

The Qwen3 Embedding models come in three sizes: small (0.6B), medium (4B), and large (8B).
All models support long input sequences up to 32,000 tokens, suitable for processing long documents or code.
Larger models have more layers (36 vs 28), larger embedding dimensions (up to 4096), potentially capturing richer semantic information.
They all support multilingual representation learning (MRL), enabling effective embedding across many languages.
All models are instruction-aware, meaning they can respond to specific task instructions or prompts, improving customization and downstream performance.

In addition, Qwen3 offers a reranking model that helps reorder query results to provide the most relevant answers.

Key Features of Qwen 3 Embedding 8B

Ability of Qwen 3 Embeeding 8B

You can check the evaluation of embedding models on this leaderboard!

How to Access Qwen 3 Embedding 8B?

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

In addition to Qwen 3 Embedding 8B , Novita AI also provides free bge-m3 to support development of open source community!

Step 1: Log In and Access the Model Library

Try Qwen 3 Embedding 8B Now!

Step 2: Choose Your Model and Start a Free Trail

Browse through the available options and select the model that suits your needs.

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 4: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI models. This is an example of using chat completions API for python users.

from openai import OpenAI
import json
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<Your API Key>",
)
model = "qwen/qwen3-embedding-8b"
def get_embeddings(text, model="qwen/qwen3-embedding-8b", encoding_format="float"):
    response = client.embeddings.create(
        model=model,
        input=text,
        encoding_format=encoding_format
    )
    return response
# Example usage
text = "The quick brown fox jumped over the lazy dog"
result = get_embeddings(text)
print(json.dumps(result.model_dump(), indent=2))

The Qwen3 Embedding series offers scalable, instruction-aware, and multilingual models (0.6B, 4B, and 8B parameters) that support long sequences and deliver industry-leading performance. With flexible customization and a dedicated reranking model, Qwen3 Embedding series is well suited for diverse real-world scenarios requiring efficient and accurate semantic understanding.

Frequently Asked Questions

What is an embedding?

An embedding is a technique used to convert input data into a vector of numerical values in a lower-dimensional space.

What problems do embedding models solve?

They enable semantic search and retrieval, improving relevance in search engines, recommendations, customer support, and more by understanding the meaning behind queries and documents.

What is the Qwen3 Embedding Model?

Qwen3-Embedding is a family of three models (0.6B, 4B, and 8B parameters) supporting long inputs (up to 32K tokens), multilingual representation learning, and instruction awareness for customized tasks. The 8B model leads the MTEB leaderboard with a score of 70.58.! You can use it on Novita AI!

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Qwen3 Embedding 8B Now Available on Novita AI: Next Level Search