Qwen 3 Reranker 8B Enhances AI Search Accuracy

Refer your friends today and both of you get $10 in LLM API credits—that’s up to $500 in total rewards waiting for you!

Llama 3.2 1B, Qwen2.5 7B, Qwen 3 (0.6B, 1.7B, 4B) ,GLM 4 — all available now on Novita AI to supercharge your projects without spending a dime!

Reranker models play a critical role in enhancing the accuracy of AI search systems by refining the order of initially retrieved documents.

The Qwen 3 Reranker 8B show outstanding benchmark performance across multilingual and code-based tasks. With easy access via Novita AI’s API platform, developers can now integrate powerful reranking capabilities into their applications efficiently.

Building with Novita AI Today!

Table Of Contents

What is Reranker Model?
What is Qwen 3 Reranker Models?
How to Access Qwen 3 Reranker Models?

What is Reranker Model?

A Reranker model is a specialized AI model that reorders a set of initially retrieved documents or items based on their relevance to a specific query. Typically, after an initial retrieval phase (using methods like BM25 or embedding-based search), a Reranker evaluates the top-k results more precisely to ensure the most relevant items are prioritized.

Rerankers are models that assign relevance scores to a query and retrieved documents. By scoring documents based on relevance, rerankers improve retrieval accuracy by selecting the most relevant subset of the initially retrieved documents.

Problems Reranker Models Solve

Enhancing Relevance: They refine initial search results to better match user intent.
Reducing Noise: By filtering out less relevant items, they improve the quality of information presented.
Improving RAG Systems: In RAG pipelines, rerankers ensure that the most pertinent documents are used for generating responses.

How to Evaluate Reranker Models

MTEB-R: English retrieval tasks from the MTEB (Massive Text Embedding Benchmark).
CMTEB-R: Chinese retrieval tasks from MTEB (focus on Chinese language performance).
MMTEB-R: Multilingual retrieval tasks (evaluate across multiple languages).
MLDR: Multilingual Long Document Retrieval (tests retrieval of long texts in various languages).
MTEB-Code: Benchmarks for code-related retrieval tasks (e.g., code search, understanding).
FollowIR: Measures how well models follow complex user instructions in search queries.

Reranker Models VS Embedding Models

Aspect	Embedding Models	Reranker Models
Function	Retrieve documents based on vector similarity	Reorder retrieved documents based on relevance
Efficiency	High (suitable for large-scale retrieval)	Lower (used for reordering a smaller set)
Accuracy	Moderate	High
Use Case	Initial retrieval	Post-retrieval refinement

What is Qwen 3 Reranker Models?

Model	Size	Layers	Sequence Length	Instruct Aware
Qwen3-Reranker-0.6B	28	32K	32K	Yes
Qwen3-Reranker-4B	4B	36	32K	Yes
Qwen3-Reranker-8B	8B	36	32K	Yes

How Qwen 3 Reranker Models Works?

Embedding

Goal: Turn text into a vector so you can search and compare efficiently.

Input: {Instruction} + {Query} / {Doc} [EOS]
- The model sees the query and document in a combined input format.
It runs through the Qwen3 model, and at the end (where [EOS] is), it takes a hidden state (like a “summary vector”).
That vector becomes the embedding—a way to represent the text in numbers so we can compare it with others.

Reranker

Goal: Give a smart score that tells how well the document matches the query.

Input: {Instruction} + {Query} + {Doc} Assistant:
- This is a more detailed input—Qwen3 sees both the query and the document together, like reading them side by side.
The model uses a cross-encoder setup, where it deeply compares the two texts.
Then, the LM head (Language Model head) gives a score (e.g., probability of “yes”).
- This score tells us: “How relevant is this document to the query?”

Benchmark of Qwen 3 Reranker Models

Model	Param	MTEB-R	CMTEB-R	MMTEB-R	MLDR	MTEB-Code	FollowIR
Jina-multilingual-reranker-v2-base	0.3B	58.22	63.37	63.73	39.66	58.98	-0.68
gte-multilingual-reranker-base	0.3B	59.51	74.08	59.44	66.33	54.18	-1.64
BGE-reranker-v2-m3	0.6B	57.03	72.16	58.36	59.51	41.38	-0.01
Qwen3-Reranker-0.6B	0.6B	65.80	71.31	66.36	67.28	73.42	5.41
Qwen3-Reranker-4B	4B	69.76	75.94	72.74	69.97	81.20	14.84
Qwen3-Reranker-8B	8B	69.02	77.45	72.94	70.19	81.22	8.05

You can check the evaluation of embedding models on this leaderboard!

How to Access Qwen 3 Reranker Models?

Novita AI is an AI cloud platform that offers developers an easy way to deploy AI models using our simple API, while also providing the affordable and reliable GPU cloud for building and scaling.

In addition to Qwen 3 Reranker 8B and Embedding 8B , Novita AI also provides free bge-m3 to support development of open source community!

Step 1: Log In and Access the Model Library

Try Qwen 3 Reranker 8B Now!

Step 2: Choose Your Model and Start a Free Trail

Browse through the available options and select the model that suits your needs.

Step 3: Get Your API Key

To authenticate with the API, we will provide you with a new API key. Entering the “Settings“ page, you can copy the API key as indicated in the image.

Step 4: Install the API

Install API using the package manager specific to your programming language.

After installation, import the necessary libraries into your development environment. Initialize the API with your API key to start interacting with Novita AI models. This is an example of using chat completions API for python users.

from openai import OpenAI

base_url = "https://api.novita.ai/v3/openai"
api_key = "<Your API Key>"
model = "qwen/qwen3-reranker-8b"

client = OpenAI(
    base_url=base_url,
    api_key=api_key,
)

stream = True # or False
max_tokens = 1000

response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    extra_body={
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

As AI applications demand more precise understanding of user intent, reranking models have become essential tools for delivering smarter search results. Acting as a second layer of intelligence after initial retrieval, rerankers fine-tune document rankings using deeper contextual analysis. The Qwen 3 Reranker series sets a new benchmark in this space, offering impressive performance across languages, long documents, and even code retrieval tasks. With deployment made simple through Novita AI, developers can harness these advanced models without heavy infrastructure—making high-accuracy retrieval more accessible than ever.

Frequently Asked Questions

What is a reranker model?

A reranker reorders a list of retrieved documents by scoring their relevance to a query, improving precision in AI search systems.

How is a reranker different from an embedding model?

Embedding Model: Converts each text into a vector and compares them using similarity.
Reranker Model: Reads both query and document together and gives a smart score for relevance.

How does Qwen 3 Reranker perform?

Qwen3-Reranker-8B achieves top-tier scores:
MTEB-R: 69.02,
CMTEB-R: 77.45,
MTEB-Code: 81.22
It outperforms popular models like BGE and GTE in multiple categories.

Novita AI is the All-in-one cloud platform that empowers your AI ambitions. Integrated APIs, serverless, GPU Instance — the cost-effective tools you need. Eliminate infrastructure, start free, and make your AI vision a reality.

Discover more from Novita

Subscribe to get the latest posts sent to your email.

Qwen3 Reranker 8B Now Available on Novita AI: Enhances AI Search Accuracy