Refer your friends today and both of you get $10 in LLM API credits—that’s up to $500 in total rewards waiting for you!
Llama 3.2 1B, Qwen2.5 7B, Qwen 3 (0.6B, 1.7B, 4B) ,GLM 4 — all available now on Novita AI to supercharge your projects without spending a dime!
Reranker models play a critical role in enhancing the accuracy of AI search systems by refining the order of initially retrieved documents.
The Qwen 3 Reranker 8B show outstanding benchmark performance across multilingual and code-based tasks. With easy access via Novita AI’s API platform, developers can now integrate powerful reranking capabilities into their applications efficiently.
What is Reranker Model?
A Reranker model is a specialized AI model that reorders a set of initially retrieved documents or items based on their relevance to a specific query. Typically, after an initial retrieval phase (using methods like BM25 or embedding-based search), a Reranker evaluates the top-k results more precisely to ensure the most relevant items are prioritized.

Rerankers are models that assign relevance scores to a query and retrieved documents. By scoring documents based on relevance, rerankers improve retrieval accuracy by selecting the most relevant subset of the initially retrieved documents.
Problems Reranker Models Solve
- Enhancing Relevance: They refine initial search results to better match user intent.
- Reducing Noise: By filtering out less relevant items, they improve the quality of information presented.
- Improving RAG Systems: In RAG pipelines, rerankers ensure that the most pertinent documents are used for generating responses.
How to Evaluate Reranker Models
- MTEB-R: English retrieval tasks from the MTEB (Massive Text Embedding Benchmark).
- CMTEB-R: Chinese retrieval tasks from MTEB (focus on Chinese language performance).
- MMTEB-R: Multilingual retrieval tasks (evaluate across multiple languages).
- MLDR: Multilingual Long Document Retrieval (tests retrieval of long texts in various languages).
- MTEB-Code: Benchmarks for code-related retrieval tasks (e.g., code search, understanding).
- FollowIR: Measures how well models follow complex user instructions in search queries.
Reranker Models VS Embedding Models
| Aspect | Embedding Models | Reranker Models |
|---|---|---|
| Function | Retrieve documents based on vector similarity | Reorder retrieved documents based on relevance |
| Efficiency | High (suitable for large-scale retrieval) | Lower (used for reordering a smaller set) |
| Accuracy | Moderate | High |
| Use Case | Initial retrieval | Post-retrieval refinement |
What is Qwen 3 Reranker Models?
| Model | Size | Layers | Sequence Length | Instruct Aware |
| Qwen3-Reranker-0.6B | 28 | 32K | 32K | Yes |
| Qwen3-Reranker-4B | 4B | 36 | 32K | Yes |
| Qwen3-Reranker-8B | 8B | 36 | 32K | Yes |
How Qwen 3 Reranker Models Works?

Embedding
Goal: Turn text into a vector so you can search and compare efficiently.
- Input:
{Instruction} + {Query} / {Doc} [EOS]- The model sees the query and document in a combined input format.
- It runs through the Qwen3 model, and at the end (where
[EOS]is), it takes a hidden state (like a “summary vector”). - That vector becomes the embedding—a way to represent the text in numbers so we can compare it with others.
Reranker
Goal: Give a smart score that tells how well the document matches the query.
- Input:
{Instruction} + {Query} + {Doc} Assistant:- This is a more detailed input—Qwen3 sees both the query and the document together, like reading them side by side.
- The model uses a cross-encoder setup, where it deeply compares the two texts.
- Then, the LM head (Language Model head) gives a score (e.g., probability of “yes”).
- This score tells us: “How relevant is this document to the query?”
Benchmark of Qwen 3 Reranker Models
| Model | Param | MTEB-R | CMTEB-R | MMTEB-R | MLDR | MTEB-Code | FollowIR |
|---|---|---|---|---|---|---|---|
| Jina-multilingual-reranker-v2-base | 0.3B | 58.22 | 63.37 | 63.73 | 39.66 | 58.98 | -0.68 |
| gte-multilingual-reranker-base | 0.3B | 59.51 | 74.08 | 59.44 | 66.33 | 54.18 | -1.64 |
| BGE-reranker-v2-m3 | 0.6B | 57.03 | 72.16 | 58.36 | 59.51 | 41.38 | -0.01 |
| Qwen3-Reranker-0.6B | 0.6B | 65.80 | 71.31 | 66.36 | 67.28 | 73.42 | 5.41 |
| Qwen3-Reranker-4B | 4B | 69.76 | 75.94 | 72.74 | 69.97 | 81.20 | 14.84 |
| Qwen3-Reranker-8B | 8B | 69.02 | 77.45 | 72.94 | 70.19 | 81.22 | 8.05 |
You can check the evaluation of embedding models on this leaderboard!
How to Access Qwen 3 Reranker Models?
Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.
In addition to Qwen 3 Reranker 8B and Embedding 8B , Novita AI also provides free bge-m3 to support development of open source community!
Step 1: Log In and Access the Model Library
Log in to your account and click on the Model Library button.

Step 2: Choose Your Model and Start a Free Trail
Browse through the available options and select the model that suits your needs.

Step 3: Get Your API Key
To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 4: Install the API
Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI models. This is an example of using chat completions API for python users.
from openai import OpenAI
base_url = "https://api.novita.ai/v3/openai"
api_key = "<Your API Key>"
model = "qwen/qwen3-reranker-8b"
client = OpenAI(
base_url=base_url,
api_key=api_key,
)
stream = True # or False
max_tokens = 1000
response_format = { "type": "text" }
chat_completion_res = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": "Hi there!",
}
],
stream=stream,
extra_body={
}
)
if stream:
for chunk in chat_completion_res:
print(chunk.choices[0].delta.content or "", end="")
else:
print(chat_completion_res.choices[0].message.content)
As AI applications demand more precise understanding of user intent, reranking models have become essential tools for delivering smarter search results. Acting as a second layer of intelligence after initial retrieval, rerankers fine-tune document rankings using deeper contextual analysis. The Qwen 3 Reranker series sets a new benchmark in this space, offering impressive performance across languages, long documents, and even code retrieval tasks. With deployment made simple through Novita AI, developers can harness these advanced models without heavy infrastructure—making high-accuracy retrieval more accessible than ever.
Frequently Asked Questions
A reranker reorders a list of retrieved documents by scoring their relevance to a query, improving precision in AI search systems.
Embedding Model: Converts each text into a vector and compares them using similarity.
Reranker Model: Reads both query and document together and gives a smart score for relevance.
Qwen3-Reranker-8B achieves top-tier scores:
MTEB-R: 69.02,
CMTEB-R: 77.45,
MTEB-Code: 81.22
It outperforms popular models like BGE and GTE in multiple categories.
Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.
Recommend Reading
- How many H100 GPUs are needed to Fine-tune DeepSeek R1?
- Choose Between Qwen 3 and Qwen 2.5: Lightweight Efficiency or Advanced Reasoning Power?
- Qwen 2.5 7B VRAM Tips Every Dev Should Know
Discover more from Novita
Subscribe to get the latest posts sent to your email.





