[Bug]: Qwen3-Reranker-8B failed to rerank on vllm 0.11.1rc4

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.3 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version                : Could not collect
CMake version                : version 3.28.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.9.0+cu130
Is debug build               : False
CUDA used to build PyTorch   : 13.0
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.13.9 (main, Oct 28 2025, 12:10:42) [Clang 20.1.4 ] (64-bit runtime)
Python platform              : Linux-6.14.0-36-generic-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : 
GPU models and configuration : 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090
GPU 2: NVIDIA GeForce RTX 4090
GPU 3: NVIDIA GeForce RTX 4090
GPU 4: NVIDIA GeForce RTX 4090
GPU 5: NVIDIA GeForce RTX 4090
GPU 6: NVIDIA GeForce RTX 4090
GPU 7: NVIDIA GeForce RTX 4090

Nvidia driver version        : 580.95.05
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.14.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.14.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           43 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  128
On-line CPU(s) list:                     0-127
Vendor ID:                               AuthenticAMD
Model name:                              AMD EPYC 7542 32-Core Processor
CPU family:                              23
Model:                                   49
Thread(s) per core:                      2
Core(s) per socket:                      32
Socket(s):                               2
Stepping:                                0
Frequency boost:                         enabled
CPU(s) scaling MHz:                      49%
CPU max MHz:                             3408.1079
CPU min MHz:                             1500.0000
BogoMIPS:                                5789.09
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization:                          AMD-V
L1d cache:                               2 MiB (64 instances)
L1i cache:                               2 MiB (64 instances)
L2 cache:                                32 MiB (64 instances)
L3 cache:                                256 MiB (16 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-31,64-95
NUMA node1 CPU(s):                       32-63,96-127
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.4.1
[pip3] numpy==2.2.6
[pip3] nvidia-cublas==13.0.0.19
[pip3] nvidia-cuda-cupti==13.0.48
[pip3] nvidia-cuda-nvrtc==13.0.48
[pip3] nvidia-cuda-runtime==13.0.48
[pip3] nvidia-cudnn-cu13==9.13.0.50
[pip3] nvidia-cudnn-frontend==1.15.0
[pip3] nvidia-cufft==12.0.0.15
[pip3] nvidia-cufile==1.15.0.42
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.3.29
[pip3] nvidia-cusparse==12.6.2.49
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-cutlass-dsl==4.3.0.dev0
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu13==2.27.7
[pip3] nvidia-nvjitlink==13.0.39
[pip3] nvidia-nvshmem-cu13==3.3.24
[pip3] nvidia-nvtx==13.0.39
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0+cu130
[pip3] torchaudio==2.9.0+cu130
[pip3] torchvision==0.24.0+cu130
[pip3] transformers==4.57.1
[pip3] triton==3.5.0
[pip3] triton-kernels==3.5.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.11.1rc5.dev0+gf25754470.d20251030 (git sha: f25754470, date: 20251030)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  	[4mGPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	CPU Affinity	NUMA Affinity	GPU NUMA ID[0m
GPU0	 X 	NODE	NODE	NODE	SYS	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU1	NODE	 X 	NODE	NODE	SYS	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU2	NODE	NODE	 X 	NODE	SYS	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU3	NODE	NODE	NODE	 X 	SYS	SYS	SYS	SYS	SYS	0-31,64-95	0		N/A
GPU4	SYS	SYS	SYS	SYS	 X 	NODE	NODE	NODE	NODE	32-63,96-127	1		N/A
GPU5	SYS	SYS	SYS	SYS	NODE	 X 	NODE	NODE	NODE	32-63,96-127	1		N/A
GPU6	SYS	SYS	SYS	SYS	NODE	NODE	 X 	NODE	PHB	32-63,96-127	1		N/A
GPU7	SYS	SYS	SYS	SYS	NODE	NODE	NODE	 X 	NODE	32-63,96-127	1		N/A
NIC0	SYS	SYS	SYS	SYS	NODE	NODE	PHB	NODE	 X 				

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_bond_0

==============================
     Environment Variables
==============================
VLLM_HOST_IP=10.88.88.13
VLLM_USE_V1=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
```
</details>

### 🐛 Describe the bug

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen/Qwen3-Reranker-8B --host 0.0.0.0 --port 10000  --tensor-parallel-size 4 --hf_overrides '{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"], "is_original_qwen3_reranker": true}'
```
with curl
```
-H "Authorization: Bearer sk-" \
-H "Content-Type: application/json" \
-d '{
  "model": "Qwen/Qwen3-Reranker-8B",
  "query": "中国首都在哪",
  "documents": [
    "北京",
    "西京",
    "南京",
    "东京","面筋"
  ],"return_documents":true
}'
{"id":"rerank-18dc216f35d64d458b54fe03d40549d3","model":"Qwen/Qwen3-Reranker-8B","usage":{"total_tokens":27},"results":[{"index":0,"document":{"text":"北京","multi_modal":null},"relevance_score":0.5},{"index":1,"document":{"text":"西京","multi_modal":null},"relevance_score":0.5},{"index":2,"document":{"text":"南京","multi_modal":null},"relevance_score":0.5},{"index":3,"document":{"text":"东京","multi_modal":null},"relevance_score":0.5},{"index":4,"document":{"text":"面筋","multi_modal":null},"relevance_score":0.5}]}#
```
Using the template will also yield the same result of 0.5
Even without using a template, it shouldn't be all 0.5
It works fine in 0.11.0, but this problem occurs in 0.11.1rc3 and 0.11.1rc4 and 0.11.1rc5
Not use hf_overrides:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen/Qwen3-Reranker-8B --host 0.0.0.0 --port 10000  --tensor-parallel-size 4 --task score 
```
with curl
```
-H "Authorization: Bearer sk-" \
-H "Content-Type: application/json" \
-d '{
  "model": "Qwen/Qwen3-Reranker-8B",
  "query": "中国首都在哪",
  "documents": [
    "北京",
    "西京",
    "南京",
    "东京","面筋"
  ],"return_documents":true
}'
{"id":"rerank-b3c17e2f440b473193761696ed9d0902","model":"Qwen/Qwen3-Reranker-8B","usage":{"total_tokens":32},"results":[{"index":4,"document":{"text":"面筋","multi_modal":null},"relevance_score":0.8528335094451904},{"index":3,"document":{"text":"东京","multi_modal":null},"relevance_score":0.8199970126152039},{"index":2,"document":{"text":"南京","multi_modal":null},"relevance_score":0.8143513202667236},{"index":0,"document":{"text":"北京","multi_modal":null},"relevance_score":0.8059771060943604},{"index":1,"document":{"text":"西京","multi_modal":null},"relevance_score":0.48023074865341187}]}#
```
### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Qwen3-Reranker-8B failed to rerank on vllm 0.11.1rc4 #27857

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Qwen3-Reranker-8B failed to rerank on vllm 0.11.1rc4 #27857

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions