Update dependency vllm to v0.8.5 [SECURITY] - autoclosed #28

renovate · 2025-01-29T04:20:21Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
vllm	`==v0.6.6` -> `==0.8.5`
vllm	`==v0.6.4` -> `==0.8.5`
vllm	`==0.6.6` -> `==0.8.5`
vllm	`==0.6.4` -> `==0.8.5`

vllm: Malicious model to RCE by torch.load in hf_model_weights_iterator

CVE-2025-24357 / GHSA-rh4j-5rhw-hr54

More information

Details

Description

The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.

Impact

This vulnerability can be exploited to execute arbitrary codes and OS commands in the victim machine who fetch the pretrained repo remotely.

Note that most models now use the safetensors format, which is not vulnerable to this issue.

References

Severity

CVSS Score: 7.5 / 10 (High)
Vector String: CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:U/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

CVE-2025-25183 / GHSA-rm76-4mrf-v9r8

More information

Details

Summary

Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.

Details

vLLM's prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions.

Impact

The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use.

Solution

We address this problem by initializing hashes in vllm with a value that is no longer constant and predictable. It will be different each time vllm runs. This restores behavior we got in Python versions prior to 3.12.

Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. Hash collisions may still occur, though they are no longer straight forward to predict.

To give an idea of the likelihood of a collision, for randomly generated hash values (assuming the hash generation built into Python is uniformly distributed), with a cache capacity of 50,000 messages and an average prompt length of 300, a collision will occur on average once every 1 trillion requests.

References

Severity

CVSS Score: 2.6 / 10 (Low)
Vector String: CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:N/I:L/A:N

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

CVE-2025-24357 Malicious model remote code execution fix bypass with PyTorch < 2.6.0

GHSA-ggpf-24jw-3fcw

More information

Details

Description

GHSA-rh4j-5rhw-hr54 reported a vulnerability where loading a malicious model could result in code execution on the vllm host. The fix applied to specify weights_only=True to calls to torch.load() did not solve the problem prior to PyTorch 2.6.0.

PyTorch has issued a new CVE about this problem: GHSA-53q9-r3pm-6pq6

This means that versions of vLLM using PyTorch before 2.6.0 are vulnerable to this problem.

Background Knowledge

When users install VLLM according to the official manual

But the version of PyTorch is specified in the requirements. txt file

So by default when the user install VLLM, it will install the PyTorch with version 2.5.1

In CVE-2025-24357, weights_only=True was used for patching, but we know this is not secure.
Because we found that using Weights_only=True in pyTorch before 2.5.1 was unsafe

Here, we use this interface to prove that it is not safe.

Fix

update PyTorch version to 2.6.0

Credit

This vulnerability was found By Ji'an Zhou and Li'shuo Song

Severity

CVSS Score: 9.8 / 10 (Critical)
Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

vLLM denial of service via outlines unbounded cache on disk

CVE-2025-29770 / GHSA-mgrm-fgjv-mhv8

More information

Details

Impact

The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server.

The affected code in vLLM is vllm/model_executor/guided_decoding/outlines_logits_processors.py, which unconditionally uses the cache from outlines. vLLM should have this off by default and allow administrators to opt-in due to the potential for abuse.

A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service if the filesystem runs out of space.

Note that even if vLLM was configured to use a different backend by default, it is still possible to choose outlines on a per-request basis using the guided_decoding_backend key of the extra_body field of the request.

This issue applies to the V0 engine only. The V1 engine is not affected.

Patches

https://github.com/vllm-project/vllm/pull/14837

The fix is to disable this cache by default since it does not provide an option to limit its size. If you want to use this cache anyway, you may set the VLLM_V0_USE_OUTLINES_CACHE environment variable to 1.

Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

Severity

CVSS Score: 6.5 / 10 (Medium)
Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

vLLM Allows Remote Code Execution via Mooncake Integration

CVE-2025-29783 / GHSA-x3m8-f7g5-qhm7

More information

Details

Summary

When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP will allow attackers to execute remote code on distributed hosts.

Details

Pickle deserialization vulnerabilities are well documented.
The mooncake pipe is exposed over the network (by design to enable disaggregated prefilling across distributed environments) using ZMQ over TCP, greatly increasing exploitability. ~~Further, the mooncake integration opens these sockets listening on all interfaces on the host, meaning it can not be configured to only use a private, trusted network.~~

Only sender_socket and receiver_ack are allowed to be accessed publicly, while the data actually decompressed by pickle.loads() comes from recv_bytes. Its interface is defined as self.receiver_socket.connect(f\"tcp://{d_host}:{d_rank_offset + 1}\"), where d_host is decode_host, a locally defined address 192.168.0.139,from mooncake.json (https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/vllm-integration-v0.2.md?plain=1#L36).

The root problem is recv_tensor() calls _recv_impl which passes the raw network bytes to pickle.loads(). Additionally, it does not appear that there are any controls (network, authentication, etc) to prevent arbitrary users from sending this payload to the affected service.

Impact

This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts.

Remediation

This issue is resolved by https://github.com/vllm-project/vllm/pull/14228

Severity

CVSS Score: 9.0 / 10 (Critical)
Vector String: CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

vLLM deserialization vulnerability in vllm.distributed.GroupCoordinator.recv_object

CVE-2024-9052 / GHSA-pgr7-mhp5-fgjp

More information

Details

vllm-project vllm version 0.6.0 contains a vulnerability in the distributed training API. The function vllm.distributed.GroupCoordinator.recv_object() deserializes received object bytes using pickle.loads() without sanitization, leading to a remote code execution vulnerability.

Maintainer perspective

Note that vLLM does NOT use the code as described in the report on huntr. The problem only exists if you use these internal APIs in a way that exposes them to a network as described. The vllm team was not involved in the analysis of this report and the decision to assign it a CVE.

Severity

CVSS Score: 9.8 / 10 (Critical)
Vector String: CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

vLLM vulnerable to Denial of Service by abusing xgrammar cache

GHSA-hf3c-wxg2-49q9

More information

Details

Impact

This report is to highlight a vulnerability in XGrammar, a library used by the structured output feature in vLLM. The XGrammar advisory is here: GHSA-389x-67px-mjg3

The xgrammar library is the default backend used by vLLM to support structured output (a.k.a. guided decoding). Xgrammar provides a required, built-in cache for its compiled grammars stored in RAM. xgrammar is available by default through the OpenAI compatible API server with both the V0 and V1 engines.

A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service by consuming all of the system's RAM.

Note that even if vLLM was configured to use a different backend by default, it is still possible to choose xgrammar on a per-request basis using the guided_decoding_backend key of the extra_body field of the request with the V0 engine. This per-request choice is not available when using the V1 engine.

Patches

https://github.com/vllm-project/vllm/pull/16283

Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

GHSA-389x-67px-mjg3

Severity

CVSS Score: 6.5 / 10 (Medium)
Vector String: CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

Data exposure via ZeroMQ on multi-node vLLM deployment

CVE-2025-30202 / GHSA-9f8f-2vmf-885j

More information

Details

Impact

In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts.

Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker.

By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher.

Detailed Analysis

The XPUB socket in question is created here:

https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L236-L237

Data is published over this socket via MessageQueue.enqueue() which is called by MessageQueue.broadcast_object():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L452-L453

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L475-L478

The MessageQueue.broadcast_object() method is called by the GroupCoordinator.broadcast_object() method in parallel_state.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L364-L366

The broadcast over ZeroMQ is only done if the GroupCoordinator was created with use_message_queue_broadcaster set to True:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L216-L219

The only case where GroupCoordinator is created with use_message_queue_broadcaster is the coordinator for the tensor parallelism group:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L931-L936

To determine what data is broadcasted to the tensor parallism group, we must continue tracing. GroupCoordinator.broadcast_object() is called by GroupCoordinator.broadcoast_tensor_dict():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L489

which is called by broadcast_tensor_dict() in communication_op.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/communication_op.py#L29-L34

If we look at _get_driver_input_and_broadcast() in the V0 worker_base.py, we'll see how this tensor dict is formed:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/worker/worker_base.py#L332-L352

but the data actually sent over ZeroMQ is the metadata_list portion that is split from this tensor_dict. The tensor parts are sent via torch.distributed and only metadata about those tensors is sent via ZeroMQ.

https://github.com/vllm-project/vllm/blob/54a66e5fee4a1ea62f1e4c79a078b20668e408c6/vllm/distributed/parallel_state.py#L61-L83

Patches

No fix yet.

Workarounds

Prior to the fix, your options include:

Do not expose the vLLM host to a network where any untrusted connections may reach the host.
Ensure that only the other vLLM hosts are able to connect to the TCP port used for the XPUB socket. Note that port used is random.

References

Relevant code first introduced in https://github.com/vllm-project/vllm/pull/6183

Severity

CVSS Score: 7.5 / 10 (High)
Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

vLLM Vulnerable to Remote Code Execution via Mooncake Integration

CVE-2025-32444 / GHSA-hj4w-hm2g-p6w5

More information

Details

Impacted Deployments

Note that vLLM instances that do NOT make use of the mooncake integration are NOT vulnerable.

Description

vLLM integration with mooncake is vaulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack.

This is a similar to GHSA - x3m8 - f7g5 - qhm7, the problem is in

https://github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py#L179

Here recv_pyobj() Contains implicit pickle.loads(), which leads to potential RCE.

Severity

CVSS Score: 10.0 / 10 (Critical)
Vector String: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H

References

This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).

Release Notes

vllm-project/vllm (vllm)

`v0.8.5`

Compare Source

This release contains 310 commits from 143 contributors (55 new contributors!).

Highlights

This release features important multi-modal bug fixes, day 0 support for Qwen3, and xgrammar's structure tag feature for tool calling.

Model Support

Day 0 support for Qwen3 and Qwen3MoE. This release fixes fp8 weight loading (#17318) and adds tuned MoE configs (#17328).
Add ModernBERT (#16648)
Add Granite Speech Support (#16246)
Add PLaMo2 (#14323)
Add Kimi-VL model support (#16387)
Add Qwen2.5-Omni model support (thinker only) (#15130)
Snowflake Arctic Embed (Family) (#16649)
Accuracy fixes for Llama4 Int4 (#16801), chat template for Llama 4 models (#16428), enhanced AMD support (#16674, #16847)

V1 Engine

Add structural_tag support using xgrammar (#17085)
Disaggregated serving:
- KV Connector API V1 (#15960)
- Adding LMCache KV connector for v1 (#16625)
Clean up: Remove Sampler from Model Code (#17084)
MLA: Simplification to batch P/D reordering (#16673)
Move usage stats to worker and start logging TPU hardware (#16211)
Support FlashInfer Attention (#16684)
Faster incremental detokenization (#15137)
EAGLE-3 Support (#16937)

Features

Validate urls object for multimodal content parts (#16990)
Prototype support sequence parallelism using compilation pass (#16155)
Add sampling params to v1/audio/transcriptions endpoint (#16591)
Enable vLLM to Dynamically Load LoRA from a Remote Server (#10546)
Add vllm bench [latency, throughput] CLI commands (#16508)

Performance

Attention:
- FA3 decode perf improvement - single mma warp group support for head dim 128 (#16864)
- Update to lastest FA3 code (#13111)
- Support Cutlass MLA for Blackwell GPUs (#16032)
MoE:
- Add expert_map support to Cutlass FP8 MOE (#16861)
- Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 (#16753)
Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Optimize rotary_emb implementation to use Triton operator for improved performance (#16457)

Hardwares

TPU:
- Enable structured decoding on TPU V1 (#16499)
- Capture multimodal encoder during model compilation (#15051)
- Enable Top-P (#16843)
AMD:
- AITER Fused MOE V1 Support (#16752)
- Integrate Paged Attention Kernel from AITER (#15001)
- Support AITER MLA (#15893)
- Upstream prefix prefill speed up for vLLM V1 (#13305)
- Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591)
- Add skinny gemms for unquantized linear on ROCm (#15830)
- Follow-ups for Skinny Gemms on ROCm. (#17011)

Documentation

Add open-webui example (#16747)
Document Matryoshka Representation Learning support (#16770)
Add a security guide (#17230)
Add example to run DeepSeek with Ray Serve LLM (#17134)
Benchmarks for audio models (#16505)

Security and Dependency Updates

Don't bind tcp zmq socket to all interfaces (#17197)
Use safe serialization and fix zmq setup for mooncake pipe (#17192)
Bump Transformers to 4.51.3 (#17116)

Build and testing

Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema (#16721)

Breaking changes 🚨

--enable-chunked-prefill, --multi-step-stream-outputs, --disable-chunked-mm-input can no longer explicitly be set to False. Instead, add no- to the start of the argument (i.e. --enable-chunked-prefill and --no-enable-chunked-prefill) (https://github.com/vllm-project/vllm/pull/16533)

What's Changed

Improve configs - SchedulerConfig by @hmellor in https://github.com/vllm-project/vllm/pull/16533
[Misc] remove warning if triton>=3.2.0 by @DefTruth in https://github.com/vllm-project/vllm/pull/16553
[Misc] refactor examples by @reidliu41 in https://github.com/vllm-project/vllm/pull/16563
[Misc] Update usage with mooncake lib for kv transfer by @ShangmingCai in https://github.com/vllm-project/vllm/pull/16523
[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet by @Shafi-Hussain in https://github.com/vllm-project/vllm/pull/16048
[Bugfix] Multi-modal caches not acting like LRU caches by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/16593
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 by @NickLucche in https://github.com/vllm-project/vllm/pull/16596
Fix triton install condition on CPU by @hmellor in https://github.com/vllm-project/vllm/pull/16600
s390x: Fix PyArrow build and add CPU test script for Buildkite CI by @Nash-123 in https://github.com/vllm-project/vllm/pull/16036
[Model][VLM] Add Kimi-VL model support by @courage17340 in https://github.com/vllm-project/vllm/pull/16387
[Hardware][TPU] Add torchvision to tpu dependency file by @lsy323 in https://github.com/vllm-project/vllm/pull/16616
[DOC][TPU] Add core idea about avoiding recompilation after warmup by @yaochengji in https://github.com/vllm-project/vllm/pull/16614
config check sleep mode support oot platforms by @celestialli in https://github.com/vllm-project/vllm/pull/16562
[Core][Bugfix] Fix Offline MM Beam Search by @alex-jw-brooks in https://github.com/vllm-project/vllm/pull/16390
[Kernel] moe wna16 marlin kernel by @jinzhen-lin in https://github.com/vllm-project/vllm/pull/14447
[BugFix]: Update minimum pyzmq version by @taneem-ibrahim in https://github.com/vllm-project/vllm/pull/16549
[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py by @tlrmchlsmth in https://github.com/vllm-project/vllm/pull/16623
[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) by @pooyadavoodi in https://github.com/vllm-project/vllm/pull/16631
Add vllm bench [latency, throughput] CLI commands by @mgoin in https://github.com/vllm-project/vllm/pull/16508
Fix vLLM x torch.compile config caching by @zou3519 in https://github.com/vllm-project/vllm/pull/16491
[Misc] refactor argument parsing in examples by @reidliu41 in https://github.com/vllm-project/vllm/pull/16635
[CI/Build] Fix LoRA OOM by @jeejeelee in https://github.com/vllm-project/vllm/pull/16624
Add "/server_info" endpoint in api_server to retrieve the vllm_config. by @Cangxihui in https://github.com/vllm-project/vllm/pull/16572
[Kernel] Remove redundant Exp calculations by @DefTruth in https://github.com/vllm-project/vllm/pull/16123
[Misc] Update compressed-tensors WNA16 to support zero-points by @dsikka in https://github.com/vllm-project/vllm/pull/14211
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server by @angkywilliam in https://github.com/vllm-project/vllm/pull/10546
[Model] Add PLaMo2 by @Alnusjaponica in https://github.com/vllm-project/vllm/pull/14323
[Bugfix] fix gpu docker image mis benchmarks dir by @lengrongfu in https://github.com/vllm-project/vllm/pull/16628
[Misc] Modify LRUCache touch by @jeejeelee in https://github.com/vllm-project/vllm/pull/16689
Disable remote caching when calling compile_fx by @zou3519 in https://github.com/vllm-project/vllm/pull/16611
[Feature] add model aware kv ops helper by @billishyahao in https://github.com/vllm-project/vllm/pull/16020
[ROCM] Bind triton version to 3.2 in requirements-built.txt by @SageMoore in https://github.com/vllm-project/vllm/pull/16664
[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py by @shen-shanshan in https://github.com/vllm-project/vllm/pull/16578
[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook by @yankay in https://github.com/vllm-project/vllm/pull/16405
[Misc] refactor examples series by @reidliu41 in https://github.com/vllm-project/vllm/pull/16708
[Doc] Improve OOM troubleshooting by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/16704
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel by @DefTruth in https://github.com/vllm-project/vllm/pull/16693
[Model] support modernbert by @xsank in https://github.com/vllm-project/vllm/pull/16648
[Hardware] Add processor inputs to platform validation by @joerunde in https://github.com/vllm-project/vllm/pull/16680
Improve error for structured output backend selection by @hmellor in https://github.com/vllm-project/vllm/pull/16717
[Misc] Remove redundant comment by @jianzs in https://github.com/vllm-project/vllm/pull/16703
Help user create custom model for Transformers backend remote code models by @hmellor in https://github.com/vllm-project/vllm/pull/16719
[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] by @p88h in https://github.com/vllm-project/vllm/pull/16432
[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification by @luyuzhe111 in https://github.com/vllm-project/vllm/pull/16636
Adding vllm buildkite job for IBM Power by @AaruniAggarwal in https://github.com/vllm-project/vllm/pull/16679
[V1][Frontend] Improve Shutdown And Logs by @robertgshaw2-redhat in https://github.com/vllm-project/vllm/pull/11737
[rocm][V0] fix selection logic for custom PA in V0 by @divakar-amd in https://github.com/vllm-project/vllm/pull/16426
[Bugfix] Update Florence-2 tokenizer to make grounding tasks work by @Isotr0py in https://github.com/vllm-project/vllm/pull/16734
[Bugfix] Revert max_prompt_len validation for decoder-only models. by @davidheineman in https://github.com/vllm-project/vllm/pull/16741
[V1] Remove log noise when idle by @russellb in https://github.com/vllm-project/vllm/pull/16735
[Ray] Improve documentation on batch inference by @richardliaw in https://github.com/vllm-project/vllm/pull/16609
[misc] ignore marlin_moe_wna16 local gen codes by @DefTruth in https://github.com/vllm-project/vllm/pull/16760
[Doc] Add more tips to avoid OOM by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/16765
[doc] add open-webui example by @reidliu41 in https://github.com/vllm-project/vllm/pull/16747
[Bugfix] Fix GLM4 model by @intervitens in https://github.com/vllm-project/vllm/pull/16618
[Doc] Fix a 404 link in installation/cpu.md by @windsonsea in https://github.com/vllm-project/vllm/pull/16773
[Misc] refactor examples series - lmcache by @reidliu41 in https://github.com/vllm-project/vllm/pull/16758
Improve configs - TokenizerPoolConfig + DeviceConfig by @hmellor in https://github.com/vllm-project/vllm/pull/16603
fix: hyperlink by @reidliu41 in https://github.com/vllm-project/vllm/pull/16778
[Doc] Make sure to update vLLM when installing latest code by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/16781
[Doc] Document Matryoshka Representation Learning support by @noooop in https://github.com/vllm-project/vllm/pull/16770
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion by @insukim1994 in https://github.com/vllm-project/vllm/pull/16784
[V1][Perf] Faster incremental detokenization by @njhill in https://github.com/vllm-project/vllm/pull/15137
[Bugfix]Fix index out of range error in api server log by @WangErXiao in https://github.com/vllm-project/vllm/pull/16787
[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 by @Ximingwang-09 in https://github.com/vllm-project/vllm/pull/16753
[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small by @lengrongfu in https://github.com/vllm-project/vllm/pull/16548
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even by @NickLucche in https://github.com/vllm-project/vllm/pull/16726
[V1][TPU] Enable Top K by @NickLucche in https://github.com/vllm-project/vllm/pull/15489
[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints by @sijiac in https://github.com/vllm-project/vllm/pull/16674
[V1][Metrics] Fix http metrics middleware by @markmc in https://github.com/vllm-project/vllm/pull/15894
[MLA] Simplification to batch P/D reordering by @njhill in https://github.com/vllm-project/vllm/pull/16673
[P/D][V1] KV Connector API V1 by @ApostaC in https://github.com/vllm-project/vllm/pull/15960
[Attention] Update to lastest FA3 code by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/13111
Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema by @tarukumar in https://github.com/vllm-project/vllm/pull/16721
[Doc] Improve help examples for --compilation-config by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/16729
[Misc] Update outdated note: LMCache now supports chunked prefill by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/16697
[V1][Structured Output] Minor modification to _validate_structured_output() by @shen-shanshan in https://github.com/vllm-project/vllm/pull/16748
Add hardware print to TPU V1 test by @mgoin in https://github.com/vllm-project/vllm/pull/16792
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/16801
Improve configs - MultiModalConfig + PoolerConfig + DecodingConfig by @hmellor in https://github.com/vllm-project/vllm/pull/16789
[Misc] add collect_env to cli and docker image by @lengrongfu in https://github.com/vllm-project/vllm/pull/16759
[ROCm] [Attention] Cleanup ROCm output passing by @ProExpertProg in https://github.com/vllm-project/vllm/pull/16431
[Bugfix] fix pp for llama4 by @luccafong in https://github.com/vllm-project/vllm/pull/16746
[Doc] add podman setup instructions for official image by @nathan-weinberg in https://github.com/vllm-project/vllm/pull/16796
[Docs] Fix a link and grammar issue in production-stack.md by @windsonsea in https://github.com/vllm-project/vllm/pull/16809
[Model] use AutoWeightsLoader for BigCode, GPT-J by @jonghyunchoe in https://github.com/vllm-project/vllm/pull/16823
[Misc] Clean up Kimi-VL by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/16833
Fix nullable_kvs fallback by @hmellor in https://github.com/vllm-project/vllm/pull/16837
[New

Configuration

📅 Schedule: Branch creation - "" in timezone America/Toronto, Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about these updates again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate · 2025-01-29T04:20:22Z

⚠️ Artifact update problem

Renovate failed to update artifacts related to this branch. You probably do not want to merge this PR as-is.

♻ Renovate will retry this branch, including artifacts, only when one of the following happens:

any of the package files in this branch needs updating, or
the branch becomes conflicted, or
you click the rebase/retry checkbox if found above, or
you rename this PR's title to start with "rebase!" to trigger it manually

The artifact failure details are included below:

File name: model-servers/vllm/0.6.4/Pipfile.lock

Command failed: pipenv lock
Locking  dependencies...
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:Cannot 
install -r /tmp/pipenv-dfai9v1d-requirements/pipenv-0e31scjl-constraints.txt 
(line 16) and transformers~=4.40.2 because these package versions have 
conflicting dependencies.
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/resolver.py", line 451, in main
[ResolutionFailure]:       _main(
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/resolver.py", line 436, in _main
[ResolutionFailure]:       resolve_packages(
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/resolver.py", line 400, in resolve_packages
[ResolutionFailure]:       results, resolver = resolve_deps(
[ResolutionFailure]:       ^^^^^^^^^^^^^
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 979, in resolve_deps
[ResolutionFailure]:       results, hashes, internal_resolver = 
actually_resolve_deps(
[ResolutionFailure]:       ^^^^^^^^^^^^^^^^^^^^^^
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 747, in actually_resolve_deps
[ResolutionFailure]:       resolver.resolve()
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 474, in resolve
[ResolutionFailure]:       raise ResolutionFailure(message=e)
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: ResolutionImpossible: for help visit 
https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts

Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: Failed to lock Pipfile.lock!

File name: model-servers/vllm/0.6.6/Pipfile.lock

Command failed: pipenv lock
Locking  dependencies...
CRITICAL:pipenv.patched.pip._internal.resolution.resolvelib.factory:Cannot 
install -r /tmp/pipenv-9jv39qe_-requirements/pipenv-_az8m5tq-constraints.txt 
(line 6) and torch==2.3.0+cu121 because these package versions have conflicting 
dependencies.
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/resolver.py", line 451, in main
[ResolutionFailure]:       _main(
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/resolver.py", line 436, in _main
[ResolutionFailure]:       resolve_packages(
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/resolver.py", line 400, in resolve_packages
[ResolutionFailure]:       results, resolver = resolve_deps(
[ResolutionFailure]:       ^^^^^^^^^^^^^
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 979, in resolve_deps
[ResolutionFailure]:       results, hashes, internal_resolver = 
actually_resolve_deps(
[ResolutionFailure]:       ^^^^^^^^^^^^^^^^^^^^^^
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 747, in actually_resolve_deps
[ResolutionFailure]:       resolver.resolve()
[ResolutionFailure]:   File 
"/opt/containerbase/tools/pipenv/2025.0.1/3.11.12/lib/python3.11/site-packages/p
ipenv/utils/resolver.py", line 474, in resolve
[ResolutionFailure]:       raise ResolutionFailure(message=e)
Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: ResolutionImpossible: for help visit 
https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-depende
ncy-conflicts

Your dependencies could not be resolved. You likely have a mismatch in your 
sub-dependencies.
You can use $ pipenv run pip install <requirement_name> to bypass this 
mechanism, then run $ pipenv graph to inspect the versions actually installed in
the virtualenv.
Hint: try $ pipenv lock --pre if it is a pre-release dependency.
ERROR: Failed to lock Pipfile.lock!

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from 8a6c54d to aa29578 Compare February 8, 2025 04:00

renovate bot changed the title ~~Update dependency vllm to v0.7.0 [SECURITY]~~ Update dependency vllm to v0.7.2 [SECURITY] Feb 8, 2025

renovate bot changed the title ~~Update dependency vllm to v0.7.2 [SECURITY]~~ Update dependency vllm to v0.7.2 [SECURITY] - autoclosed Mar 12, 2025

renovate bot closed this Mar 12, 2025

renovate bot deleted the renovate/pypi-vllm-vulnerability branch March 12, 2025 15:45

renovate bot changed the title ~~Update dependency vllm to v0.7.2 [SECURITY] - autoclosed~~ Update dependency vllm to v0.7.2 [SECURITY] Mar 12, 2025

renovate bot reopened this Mar 12, 2025

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch 2 times, most recently from aa29578 to 4ce675b Compare March 14, 2025 03:59

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from 4ce675b to 3d8e01c Compare March 19, 2025 18:23

renovate bot changed the title ~~Update dependency vllm to v0.7.2 [SECURITY]~~ Update dependency vllm to v0.8.0 [SECURITY] Mar 19, 2025

renovate bot changed the title ~~Update dependency vllm to v0.8.0 [SECURITY]~~ Update dependency vllm to v0.8.0 [SECURITY] - autoclosed Mar 21, 2025

renovate bot closed this Mar 21, 2025

renovate bot changed the title ~~Update dependency vllm to v0.8.0 [SECURITY] - autoclosed~~ Update dependency vllm to v0.8.0 [SECURITY] Mar 26, 2025

renovate bot reopened this Mar 26, 2025

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from 6bdc2be to 3d8e01c Compare March 26, 2025 04:00

renovate bot changed the title ~~Update dependency vllm to v0.8.0 [SECURITY]~~ Update dependency vllm to v0.8.2 [SECURITY] Mar 26, 2025

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from 3d8e01c to caf06e0 Compare March 29, 2025 11:40

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from caf06e0 to 4b431d9 Compare April 16, 2025 20:06

renovate bot changed the title ~~Update dependency vllm to v0.8.2 [SECURITY]~~ Update dependency vllm [SECURITY] Apr 16, 2025

Update dependency vllm to v0.8.5 [SECURITY]

9e59c22

Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

renovate bot force-pushed the renovate/pypi-vllm-vulnerability branch from 4b431d9 to 9e59c22 Compare May 1, 2025 16:21

renovate bot changed the title ~~Update dependency vllm [SECURITY]~~ Update dependency vllm to v0.8.5 [SECURITY] May 1, 2025

renovate bot changed the title ~~Update dependency vllm to v0.8.5 [SECURITY]~~ Update dependency vllm to v0.8.5 [SECURITY] - autoclosed May 8, 2025

renovate bot closed this May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update dependency vllm to v0.8.5 [SECURITY] - autoclosed #28

Update dependency vllm to v0.8.5 [SECURITY] - autoclosed #28

Uh oh!

renovate bot commented Jan 29, 2025 •

edited

Loading

Uh oh!

renovate bot commented Jan 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update dependency vllm to v0.8.5 [SECURITY] - autoclosed #28

Update dependency vllm to v0.8.5 [SECURITY] - autoclosed #28

Uh oh!

Conversation

renovate bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

vllm: Malicious model to RCE by torch.load in hf_model_weights_iterator

Details

Description

Impact

References

Severity

References

vLLM uses Python 3.12 built-in hash() which leads to predictable hash collisions in prefix cache

Details

Summary

Details

Impact

Solution

References

Severity

References

CVE-2025-24357 Malicious model remote code execution fix bypass with PyTorch < 2.6.0

Details

Description

Background Knowledge

Fix

Credit

Severity

References

vLLM denial of service via outlines unbounded cache on disk

Details

Impact

Patches

Workarounds

References

Severity

References

vLLM Allows Remote Code Execution via Mooncake Integration

Details

Summary

Details

Impact

Remediation

Severity

References

vLLM deserialization vulnerability in vllm.distributed.GroupCoordinator.recv_object

Details

Maintainer perspective

Severity

References

vLLM vulnerable to Denial of Service by abusing xgrammar cache

Details

Impact

Patches

Workarounds

References

Severity

References

Data exposure via ZeroMQ on multi-node vLLM deployment

Details

Impact

Detailed Analysis

Patches

Workarounds

References

Severity

References

vLLM Vulnerable to Remote Code Execution via Mooncake Integration

Details

Impacted Deployments

Description

Severity

References

Release Notes

v0.8.5

Highlights

Model Support

V1 Engine

Features

renovate bot commented Jan 29, 2025 •

edited

Loading

`v0.8.5`

renovate bot commented Jan 29, 2025 •

edited

Loading