[New Model]: Support Qwen3 Embedding & Reranker #19260

noooop · 2025-06-06T07:37:57Z

Summary

Qwen3 Embedding
- Qwen/Qwen3-Embedding-0.6B
- Qwen/Qwen3-Embedding-4B
- Qwen/Qwen3-Embedding-8B
Qwen3 Reranker
- Qwen/Qwen3-Reranker-0.6B
- Qwen/Qwen3-Reranker-4B
- Qwen/Qwen3-Reranker-8B
- tomaarsen/Qwen3-Reranker-0.6B-seq-cls

Usage

Qwen3 Embedding

vllm serve Qwen/Qwen3-Embedding-0.6B

curl

curl http://127.0.0.1:8000/v1/embeddings \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "input": "Follow the white rabbit.",
    "model": "Qwen/Qwen3-Embedding-0.6B",
    "encoding_format": "float"
  }'

Qwen3 Reranker

Caution

Please use the query_template and document_template to format the query and document for better reranker results.
without template, the results are almost as random. PTAL #19344

for models that have been converted into Qwen3ForSequenceClassification, such as tomaarsen/Qwen3-Reranker-0.6B-seq-cls

vllm serve tomaarsen/Qwen3-Reranker-0.6B-seq-cls

/score

curl http://127.0.0.1:8000/score \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "text_1": "ping",
    "text_2": "pong",
    "model": "tomaarsen/Qwen3-Reranker-0.6B-seq-cls"
  }'

expected output

{"id":"score-ee337e20e932467a83792d220614a7cd","object":"list","created":1749527048,"model":"tomaarsen/Qwen3-Reranker-0.6B-seq-cls","data":[{"index":0,"object":"score","score":0.06634521484375}],"usage":{"prompt_tokens":2,"total_tokens":2,"completion_tokens":0,"prompt_tokens_details":null}}

/rerank

curl http://127.0.0.1:8000/rerank \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "ping",
    "documents": ["pong"],
    "model": "tomaarsen/Qwen3-Reranker-0.6B-seq-cls"
  }'

expected output

{"id":"rerank-fe06b692387444b7a56e282944f285f9","model":"tomaarsen/Qwen3-Reranker-0.6B-seq-cls","usage":{"total_tokens":2},"results":[{"index":0,"document":{"text":"pong"},"relevance_score":0.06634521484375}]}

for the official model

vllm serve Qwen/Qwen3-Reranker-0.6B --hf_overrides '{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'

Why do we need hf_overrides:
Qwen3-Reranker is a language model that doing reranker by using the logits of "no" and "yes" tokens.
vllm converts it to Qwen3ForSequenceClassification when loaded for better performance.

Firstly, we need using "architectures": ["Qwen3ForSequenceClassification"], to manually route to Qwen3ForSequenceClassification.
Then, we will extract the vector corresponding to classifier_from_token from lm_head using "classifier_from_token": ["no", "yes"].
Third, we will convert these two vectors into one vector. The use of conversion logic is controlled by using "is_original_qwen3_reranker": True.

If you correctly start vllm serve, use the corresponding model name, then calling the official model is the same as the model that has been converted into Qwen3ForSequenceClassification.

/score

curl http://127.0.0.1:8000/score \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "text_1": "ping",
    "text_2": "pong",
    "model": "Qwen/Qwen3-Reranker-0.6B"
  }'

expected output

{"id":"score-7dbe101346ea4aeea4b85aa7971ddf8f","object":"list","created":1749527323,"model":"Qwen/Qwen3-Reranker-0.6B","data":[{"index":0,"object":"score","score":0.0673828125}],"usage":{"prompt_tokens":2,"total_tokens":2,"completion_tokens":0,"prompt_tokens_details":null}}

/rerank

curl http://127.0.0.1:8000/rerank \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "query": "ping",
    "documents": ["pong"],
    "model": "Qwen/Qwen3-Reranker-0.6B"
  }'

expected output

{"id":"rerank-43ddc0f96f174ae4a5eef07d51a8defd","model":"Qwen/Qwen3-Reranker-0.6B","usage":{"total_tokens":2},"results":[{"index":0,"document":{"text":"pong"},"relevance_score":0.0673828125}]}

Offline and formating query & document:

from vllm import LLM

model_name = "Qwen/Qwen3-Reranker-0.6B"

# What is the difference between the official original version and one
# that has been converted into a sequence classification model?
# Qwen3-Reranker is a language model that doing reranker by using the
# logits of "no" and "yes" tokens.
# It needs to computing 151669 tokens logits, making this method extremely
# inefficient, not to mention incompatible with the vllm score API.
# A method for converting the original model into a sequence classification
# model was proposed. See：https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3
# Models converted offline using this method can not only be more efficient
# and support the vllm score API, but also make the init parameters more
# concise, for example.
# model = LLM(model="tomaarsen/Qwen3-Reranker-0.6B-seq-cls", task="score")

# If you want to load the official original version, the init parameters are
# as follows.

model = LLM(
    model=model_name,
    task="score",
    hf_overrides={
        "architectures": ["Qwen3ForSequenceClassification"],
        "classifier_from_token": ["no", "yes"],
        "is_original_qwen3_reranker": True,
    },
)

# Why do we need hf_overrides for the official original version:
# vllm converts it to Qwen3ForSequenceClassification when loaded for
# better performance.
# - Firstly, we need using `"architectures": ["Qwen3ForSequenceClassification"],`
# to manually route to Qwen3ForSequenceClassification.
# - Then, we will extract the vector corresponding to classifier_from_token
# from lm_head using `"classifier_from_token": ["no", "yes"]`.
# - Third, we will convert these two vectors into one vector.  The use of
# conversion logic is controlled by `using "is_original_qwen3_reranker": True`.

# Please use the query_template and document_template to format the query and
# document for better reranker results.

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"

query_template = "{prefix}<Instruct>: {instruction}\n<Query>: {query}\n"
document_template = "<Document>: {doc}{suffix}"

if __name__ == "__main__":
    instruction = (
        "Given a web search query, retrieve relevant passages that answer the query"
    )

    queries = [
        "What is the capital of China?",
        "Explain gravity",
    ]

    documents = [
        "The capital of China is Beijing.",
        "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
    ]

    queries = [
        query_template.format(prefix=prefix, instruction=instruction, query=query)
        for query in queries
    ]
    documents = [document_template.format(doc=doc, suffix=suffix) for doc in documents]

    outputs = model.score(queries, documents)

    print([output.outputs.score for output in outputs])

requests demo + formating query & document:

import requests

url = "http://127.0.0.1:8000/score"
MODEL_NAME = "tomaarsen/Qwen3-Reranker-0.6B-seq-cls"

# Please use the query_template and document_template to format the query and
# document for better reranker results.

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"

query_template = "{prefix}<Instruct>: {instruction}\n<Query>: {query}\n"
document_template = "<Document>: {doc}{suffix}"

instruction = (
    "Given a web search query, retrieve relevant passages that answer the query"
)

queries = [
    "What is the capital of China?",
    "Explain gravity",
]

documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

queries = [
    query_template.format(prefix=prefix, instruction=instruction, query=query)
    for query in queries
]
documents = [
    document_template.format(doc=doc, suffix=suffix) for doc in documents
]

response = requests.post(url,
                         json={
                             "model": MODEL_NAME,
                             "text_1": queries,
                             "text_2": documents,
                             "truncate_prompt_tokens": -1,
                         }).json()

print(response)

expected output

{'id': 'score-14f698f021b9434482ec3d94a5757e11', 'object': 'list', 'created': 1749786173, 'model': 'tomaarsen/Qwen3-Reranker-0.6B-seq-cls', 'data': [{'index': 0, 'object': 'score', 'score': 0.99951171875}, {'index': 1, 'object': 'score', 'score': 0.99951171875}], 'usage': {'prompt_tokens': 189, 'total_tokens': 189, 'completion_tokens': 0, 'prompt_tokens_details': None}}

Legacy

For Embedding
After merging
https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/discussions/2
embeddings-benchmark/mteb#2769 (comment)
Qwen3-Embedding can already output results close to SentenceTransformers.

For Reranker

Qwen3ForCausalLM: Qwen3 Embedding & Reranker both use the same architecture Qwen3ForCausalLM, vllm currently has no way to allow a single architecture to support Embedding and Reranker at the same time.
SupportsCrossEncoding: For Reranker, The biggest problem is that the task Score will treat the Qwen3ForCausalLM model in a way that calculates the Score based on embedding models, which involves calculating embeddings and cosine distance. This is definitely not what was wanted.
Qwen3ForSequenceClassification: Perhaps ultimately we need something like --hf-overrides '{"architectures": ["Qwen3ForSequenceClassification"]}' to get Qwen3 Reranker to run correctly.
classifier_from_token: A more efficient approach is to extract token_false_id = 2152 and token_true_id = 9693 into a 2-class classification task rather than the current 151669-class classification task. We need a new interface (classifier_from_token) to implement it
converted to a single label classification model The 2-way classifier is actually just a 1- way head classifier. https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3
format_instruction, Should format_instruction be handled by users or vllm, and where should this piece of code be placed? I temporarily added a process_inputs callback function for LLM.score, but the online version (OpenAI-Compatible Server) doesn't know how to handle it. Please format the string using the method above.

FIX #19229
FIX #19252
FIX #19366

gemini-code-assist

Hello @noooop, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. This PR aims to add initial support for Qwen3 Embedding and Reranker models within vLLM. Based on the PR title and description, the primary goal is to enable these specific Qwen3 model types. The author notes that the current implementation, particularly for the Reranker task, is a "dirty fix" due to underlying architectural challenges where both Embedding and Reranker models share the same Qwen3ForCausalLM architecture in Hugging Face, which vLLM isn't currently designed to handle cleanly for these distinct tasks. The PR focuses on implementing a specific scoring mechanism for the Reranker task by leveraging the existing causal language model structure.

Highlights

Qwen3 Model Support: Adds initial support for Qwen3 Embedding and Reranker models.
Reranker Scoring Logic: Implements a specific scoring method for the Reranker task by extracting logits for predefined true/false tokens (2152 and 9693) and calculating a score based on their probabilities.
Cross-Encoding Interface: The Qwen3ForCausalLM class now implements the SupportsCrossEncoding interface, indicating its capability for tasks like reranking.
Known Limitations: The author explicitly mentions several existing issues with this approach, including the challenge of supporting both Embedding and Reranker tasks with the same underlying architecture and the current scoring method being a workaround.

Changelog

vllm/model_executor/models/qwen3.py
- Added imports for pooling-related classes (LastPool, PoolingMetadata, PoolerOutput, PoolingSequenceGroupOutput) and the SupportsCrossEncoding interface.
- Updated the Qwen3ForCausalLM class definition to inherit from SupportsCrossEncoding.
- In the Qwen3ForCausalLM constructor, initialized a LastPool layer if the model task is set to "score".
- Added a pooler method to Qwen3ForCausalLM which implements the custom scoring logic for the reranker task using specific token logits (2152 and 9693).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-06-06T07:39:21Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This PR adds support for Qwen3 Embedding and Reranker models by introducing specific handling for the 'score' task within the Qwen3ForCausalLM architecture. The approach for reranking is acknowledged as a workaround and involves using logits of specific 'true'/'false' tokens. The changes are targeted and address the issues outlined in the PR description. However, the hardcoded token IDs are a concern for maintainability and generalizability.

Summary of Findings

Hardcoded Token IDs: The pooler method in vllm/model_executor/models/qwen3.py uses hardcoded token IDs (2152 for false, 9693 for true). This is a maintainability concern and should ideally be made configurable or derived from model/tokenizer configuration.
Clarity on Reranker Logic: The choice of LastPool in the __init__ method for the 'score' task, and how it relates to the subsequent logit-based scoring in the pooler method, could benefit from more explicit comments or documentation, especially given its characterization as a "dirty fix".

Merge Readiness

This pull request makes a good effort to support Qwen3 Reranker models within the existing Qwen3ForCausalLM architecture, acknowledging the current limitations. The approach taken is a pragmatic workaround.

However, the use of hardcoded token IDs is a significant concern that should be addressed. Ideally, these should be configurable or automatically derived. At a minimum, their origin and necessity for being hardcoded should be clearly documented in the code.

Given these points, and the author's own acknowledgement of this being a "dirty fix", I recommend addressing the hardcoded token ID issue and potentially clarifying the LastPool rationale before merging. I am unable to approve pull requests, but I suggest further discussion on these points. Other reviewers should assess the overall architectural implications of this workaround.

vllm/model_executor/models/qwen3.py

noooop · 2025-06-06T11:00:51Z

@DarkLight1337

quick review

vllm/entrypoints/llm.py

noooop · 2025-06-06T12:19:45Z

tests/models/language/pooling/test_qwen3_reranker.py is a bit hastily, I am refactoring the test for score, the next PR will fix it.

vllm/model_executor/models/qwen3.py

noooop · 2025-06-06T14:42:15Z

The current code is a bit hacky.

Wait until next week to see if the official can convert the model into Qwen3ForSequenceClassification format.

tomaarsen · 2025-06-06T16:40:31Z

Hello!

Thank you for your useful work here @noooop. I converted the model to a sequence classification model in the meantime for me to be able to do some testing - it might come in handy for you as well: https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls

I can also move it to the cross-encoder organization, but I'd like to avoid that until I get models with these larger templates working more nicely with Sentence Transformers, i.e. without having to do all kinds of pre-processing manually.

Tom Aarsen

lovetian1991 · 2025-06-09T06:25:45Z

quick review

TPLink32 · 2025-06-17T08:52:30Z

en3ForSequenceClassification"],"cla

thinks
New problem with embedding
INFO 06-17 16:37:49 [engine.py:317] Added request embd-be953f7ccae14a96b5f9f8e13b16b4d7-61.
INFO 06-17 16:37:49 [engine.py:317] Added request embd-be953f7ccae14a96b5f9f8e13b16b4d7-62.
INFO 06-17 16:37:49 [engine.py:317] Added request embd-be953f7ccae14a96b5f9f8e13b16b4d7-63.
ERROR 06-17 16:37:49 [engine.py:165] AttributeError("'PlaceholderBlockSpaceManager' object has no attribute 'remove_seq_from_computed_blocks_tracker'")
ERROR 06-17 16:37:49 [engine.py:165] Traceback (most recent call last):
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 163, in start
ERROR 06-17 16:37:49 [engine.py:165] self.run_engine_loop()
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 226, in run_engine_loop
ERROR 06-17 16:37:49 [engine.py:165] request_outputs = self.engine_step()
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 252, in engine_step
ERROR 06-17 16:37:49 [engine.py:165] raise e
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 235, in engine_step
ERROR 06-17 16:37:49 [engine.py:165] return self.engine.step()
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 1296, in step
ERROR 06-17 16:37:49 [engine.py:165] ) = self.scheduler[virtual_engine].schedule()
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1553, in schedule
ERROR 06-17 16:37:49 [engine.py:165] scheduler_outputs: SchedulerOutputs = self._schedule()
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1512, in _schedule
ERROR 06-17 16:37:49 [engine.py:165] return self._schedule_default()
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1277, in _schedule_default
ERROR 06-17 16:37:49 [engine.py:165] prefills = self._schedule_prefills(budget,
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1195, in _schedule_prefills
ERROR 06-17 16:37:49 [engine.py:165] self.remove_seq_from_computed_blocks_tracker(
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1715, in remove_seq_from_computed_blocks_tracker
ERROR 06-17 16:37:49 [engine.py:165] self._remove_seq_from_computed_blocks_tracker(seq)
ERROR 06-17 16:37:49 [engine.py:165] File "/data/code/llm_test/vllm_env/lib/python3.11/site-packages/vllm/core/scheduler.py", line 1722, in _remove_seq_from_computed_blocks_tracker
ERROR 06-17 16:37:49 [engine.py:165] self.block_manager.remove_seq_from_computed_blocks_tracker(seq)
ERROR 06-17 16:37:49 [engine.py:165] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-17 16:37:49 [engine.py:165] AttributeError: 'PlaceholderBlockSpaceManager' object has no attribute 'remove_seq_from_computed_blocks_tracker'

noooop · 2025-06-17T09:11:54Z

ERROR 06-17 16:37:49 [engine.py:165] AttributeError: 'PlaceholderBlockSpaceManager' object has no attribute 'remove_seq_from_computed_blocks_tracker'

fixed by #19686

dengcao · 2025-06-20T03:58:07Z

You can use Vllm to deploy the Qwen3-Embedding&Qwen3-Reranker large model.There is already a temporary solution:
以下是通过vllm快捷安装Qwen3-Embedding-0.6B/4B/8B，Qwen3-Reranker-0.6B/4B/8B的办法。

Docker 快速部署（一键安装包）：
Docker Compose Quick Installation:

Hugging Face：
https://huggingface.co/dengcao/Qwen3-Embedding-0.6B
https://huggingface.co/dengcao/Qwen3-Embedding-4B
https://huggingface.co/dengcao/Qwen3-Embedding-8B
https://huggingface.co/dengcao/Qwen3-Reranker-0.6B
https://huggingface.co/dengcao/Qwen3-Reranker-4B
https://huggingface.co/dengcao/Qwen3-Reranker-8B

ModelScope：
https://www.modelscope.cn/models/dengcao/Qwen3-Embedding-0.6B
https://www.modelscope.cn/models/dengcao/Qwen3-Embedding-4B
https://www.modelscope.cn/models/dengcao/Qwen3-Embedding-8B
https://www.modelscope.cn/models/dengcao/Qwen3-Reranker-0.6B
https://www.modelscope.cn/models/dengcao/Qwen3-Reranker-4B
https://www.modelscope.cn/models/dengcao/Qwen3-Reranker-8B

GitHub：
https://github.com/dengcao/Qwen3-Reranker-8B
https://github.com/dengcao/Qwen3-Reranker-4B
https://github.com/dengcao/Qwen3-Reranker-0.6B

Signed-off-by: minpeter <kali2005611@gmail.com>

umie0128 · 2025-07-23T02:51:57Z

@noooop 能更新下modelscope、github的ReadMe vllm的用法吗？

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

very dirty fix

5c57bfa

gemini-code-assist bot reviewed Jun 6, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 6, 2025

View reviewed changes

vllm/model_executor/models/qwen3.py Outdated Show resolved Hide resolved

vllm/model_executor/models/qwen3.py Outdated Show resolved Hide resolved

+ Qwen3ForSequenceClassification

4dddb99

DarkLight1337 self-assigned this Jun 6, 2025

noooop added 5 commits June 6, 2025 16:30

+ classifier_from_token

65646a1

+ tests

b36f788

+ Embedding tests

90660fc

Qwen/Qwen3-Embedding-0.6B needs float32 to pass the test

a44fb4f

+ process_inputs

4ef23af

mergify bot added the frontend label Jun 6, 2025

DarkLight1337 reviewed Jun 6, 2025

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

noooop added 4 commits June 6, 2025 19:33

+ 2-way -> 1-way

2fe1791

- process_inputs

cb4c023

- process_inputs

2a32e0f

+ registry

caa6675

noooop marked this pull request as ready for review June 6, 2025 12:17

noooop requested a review from ywang96 as a code owner June 6, 2025 12:17

DarkLight1337 reviewed Jun 6, 2025

View reviewed changes

vllm/model_executor/models/qwen3.py Outdated Show resolved Hide resolved

auroraustc mentioned this pull request Jun 9, 2025

VLLM启动qwen3-reranker-8B模型, 接口调用后报错"AttributeError: 'Qwen3ForCausalLM' object has no attribute 'pooler'" QwenLM/Qwen3-Embedding#24

Open

noooop marked this pull request as draft June 9, 2025 05:34

Merge branch 'vllm-project:main' into Qwen3_gte

61248b1

noooop added 2 commits June 9, 2025 15:49

refactor

fccddef

+ examples

72e1a2e

vaclcer mentioned this pull request Jun 18, 2025

[Feature]: Support Qwen3 Embedding & Reranker #19229

Closed

1 task

gyou2021 pushed a commit to gyou2021/vllm-fork that referenced this pull request Jun 19, 2025

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

abd2582

noooop mentioned this pull request Jun 19, 2025

[Model][Last/4] Automatic conversion of CrossEncoding model #19675

Merged

4 tasks

JiayuJeff mentioned this pull request Jun 22, 2025

[Feature]: Qwen3-Rerank-8B online serving score API #19930

Closed

1 task

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

3c91015

Signed-off-by: minpeter <kali2005611@gmail.com>

This was referenced Jun 30, 2025

[Frontend][Model] Qwen3Rerank API Server backward compatibility #20237

Closed

[Frontend][Model] Qwen3Rerank API Server backward compatibility #20239

Open

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

2ecd875

Thachnh pushed a commit to deepinfra/vllm that referenced this pull request Jul 1, 2025

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

a349e07

noooop mentioned this pull request Jul 1, 2025

[Bug]: jinaai/jina-reranker-v2-base-multilingual doesn't support long context above 1024 tokens #20300

Closed

1 task

maxdebayser mentioned this pull request Jul 1, 2025

[Model][3/N] Automatic conversion of CrossEncoding model #20168

Merged

4 tasks

noooop mentioned this pull request Jul 4, 2025

[Bug]: Qwen3 Rerank 模型的准确率存在问题 #20478

Closed

1 task

distributedlock mentioned this pull request Jul 6, 2025

[Bug]: Qwen/Qwen3-Reranker-0.6B Qwen 3 based reranking models are working #20532

Closed

1 task

NiuBlibing mentioned this pull request Jul 10, 2025

[Bug]: Qwen3-Reranker-vllm exhibits a large gap between offline and online inference. #20730

Open

1 task

noooop deleted the Qwen3_gte branch July 10, 2025 04:46

whtDHU mentioned this pull request Jul 10, 2025

qwen3-reranker模型准确率存在问题 QwenLM/Qwen3-Embedding#96

Open

DarkLight1337 mentioned this pull request Jul 11, 2025

Update registry.py #17762

Closed

ardge-timwu mentioned this pull request Jul 14, 2025

vllm + 5090 series problem changtimwu/cloudgpu#12

Open

yizhang2077 mentioned this pull request Jul 14, 2025

[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model sgl-project/sglang#7957

Merged

6 tasks

Abhi011999 mentioned this pull request Jul 17, 2025

chore: update transformers to support Qwen3 architecture michaelfeil/infinity#620

Closed

3 tasks

nhha1602 mentioned this pull request Jul 24, 2025

Add rerank mode at localAi Plugin failed langgenius/dify-official-plugins#883

Closed

5 tasks

noooop mentioned this pull request Jul 28, 2025

[Bug]: Qwen3-Reranker issue with score #21681

Closed

1 task

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

6ac93b5

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

noooop mentioned this pull request Aug 26, 2025

[Bug]: vllm运行Qwen3 reranker准确性问题 #23615

Closed

1 task

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

0da29ea

noooop mentioned this pull request Sep 5, 2025

[Bug]: Qwen3-reranker默认示例的输出结果不精准，与Qwen3-embedding仓库中提供的vllm示例输出结果不一致（顺序不一致）。 #24242

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[New Model]: Support Qwen3 Embedding & Reranker #19260

[New Model]: Support Qwen3 Embedding & Reranker #19260

noooop commented Jun 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

noooop commented Jun 6, 2025

Uh oh!

Uh oh!

noooop commented Jun 6, 2025

Uh oh!

Uh oh!

noooop commented Jun 6, 2025

Uh oh!

tomaarsen commented Jun 6, 2025 •

edited

Loading

Uh oh!

lovetian1991 commented Jun 9, 2025

Uh oh!

TPLink32 commented Jun 17, 2025

Uh oh!

noooop commented Jun 17, 2025 •

edited

Loading

Uh oh!

dengcao commented Jun 20, 2025 •

edited

Loading

Uh oh!

umie0128 commented Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

[New Model]: Support Qwen3 Embedding & Reranker #19260

[New Model]: Support Qwen3 Embedding & Reranker #19260

Conversation

noooop commented Jun 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

github-actions bot commented Jun 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

noooop commented Jun 6, 2025

Uh oh!

Uh oh!

noooop commented Jun 6, 2025

Uh oh!

Uh oh!

noooop commented Jun 6, 2025

Uh oh!

tomaarsen commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lovetian1991 commented Jun 9, 2025

Uh oh!

TPLink32 commented Jun 17, 2025

Uh oh!

noooop commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dengcao commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umie0128 commented Jul 23, 2025

Uh oh!

Uh oh!

noooop commented Jun 6, 2025 •

edited by github-actions bot

Loading

tomaarsen commented Jun 6, 2025 •

edited

Loading

noooop commented Jun 17, 2025 •

edited

Loading

dengcao commented Jun 20, 2025 •

edited

Loading