Sync release with main #230

dtrifiro · 2024-11-13T10:48:08Z

No description provided.

Signed-off-by: Max de Bayser <[email protected]>

…#9350)

…roject#9095)

…roject#9351)

…t#9349)

…lm-project#8909) Co-authored-by: DarkLight1337 <[email protected]>

…ty token_ids (vllm-project#9034) Co-authored-by: Nick Hill <[email protected]>

Co-authored-by: sanghol <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]>

…roject#9303)

…ls (vllm-project#9412) Co-authored-by: DarkLight1337 <[email protected]>

…t#9410)

…-project#9189)

…M-S-1B-sft (vllm-project#9396)

…oject#9333)

…project#9267) Signed-off-by: Russell Bryant <[email protected]>

…antize kernel (vllm-project#9425)

…t#9391)

…t#9395)

sync with upstream @ v0.6.3.post1

libraries these are included in the python venv at /opt/vllm (torch install) so we can get rid of them.

… retrieving device information This prevents the spamming in the logs of `amdgpu.ids: No such file or directory`

this makes sure that the shared libraries in the python venv are found by the linker.

openshift-ci · 2024-11-13T10:48:13Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2024-11-13T10:48:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dtrifiro

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dtrifiro]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: maleksan85 <[email protected]>

## Description Syncing to upstream vLLM upstream tag [v0.9.0](https://github.com/vllm-project/vllm/tree/v0.9.0) This PR does *not* target any tag upstream as of now. Builds on top of: neuralmagic/nm-vllm-ent@eea2469 Git log: ``` commit 922878c (HEAD -> upstream-sync-2025-05-21-notag, nm-fork/upstream-sync-2025-05-21-notag) Merge: a6275cd 5873877 Author: Selbi Nuryyeva <[email protected]> Date: Wed May 28 16:21:22 2025 -0400 Merge tag 'v0.9.0' into upstream-sync-2025-05-21-notag commit 5873877 (tag: v0.9.0, upstream/releases/v0.9.0) Author: Michael Goin <[email protected]> Date: Tue May 27 12:05:37 2025 -0400 [Bugfix] Mistral tool calling when content is list (vllm-project#18729) Signed-off-by: mgoin <[email protected]> commit 696259c Author: Cyrus Leung <[email protected]> Date: Tue May 27 23:45:48 2025 +0800 [Core] Automatically cast multi-modal input dtype (vllm-project#18756) Signed-off-by: DarkLight1337 <[email protected]> commit 6b6d496 Author: chunxiaozheng <[email protected]> Date: Tue May 27 21:08:44 2025 +0800 optimize get_kv_cache_torch_dtype (vllm-project#18531) Signed-off-by: idellzheng <[email protected]> commit aaa4ac1 Author: cascade <[email protected]> Date: Tue May 27 05:06:34 2025 -0700 Disable prefix cache by default for benchmark (vllm-project#18639) Signed-off-by: cascade812 <[email protected]> commit 06a0338 Author: Mark McLoughlin <[email protected]> Date: Tue May 27 10:37:06 2025 +0100 [V1][Metrics] Add API for accessing in-memory Prometheus metrics (vllm-project#17010) Signed-off-by: Mark McLoughlin <[email protected]> ``` Commands ``` git fetch upstream --tags git checkout -b upstream-sync-2025-05-21-notag git merge v0.9.0 ``` ## Testing [accept-sync](https://github.com/neuralmagic/nm-cicd/actions/runs/15309901499) run Notes: couple lm-evals are failing but that's because model locations are not fully updated and some model card numbers need updating because of that. one unit test is failing due to flashinfer version. These updates will be done in a separate PR in nm-cicd repo. The merge is good to go.

maxdebayser and others added 30 commits October 8, 2024 10:58

add rsync to UBI images

e141de1

Signed-off-by: Max de Bayser <[email protected]>

[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (vllm-project…

473e7b3

…#9350)

[Frontend] merge beam search implementations (vllm-project#9296)

4d31cd4

[Model] Make llama3.2 support multiple and interleaved images (vllm-p…

f0fe4fe

…roject#9095)

[Bugfix] Clean up some cruft in mamba.py (vllm-project#9343)

169b530

[Frontend] Clarify model_type error messages (vllm-project#9345)

44eaa5a

[Doc] Fix code formatting in spec_decode.rst (vllm-project#9348)

8e836d9

[Bugfix] Update InternVL input mapper to support image embeds (vllm-p…

55e081f

…roject#9351)

[BugFix] Fix chat API continuous usage stats (vllm-project#9357)

e9d517f

pass ignore_eos parameter to all benchmark_serving calls (vllm-projec…

5d264f4

…t#9349)

[Misc] Directly use compressed-tensors for checkpoint definitions (vl…

22f8a69

…lm-project#8909) Co-authored-by: DarkLight1337 <[email protected]>

[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with emp…

ba30942

…ty token_ids (vllm-project#9034) Co-authored-by: Nick Hill <[email protected]>

[Bugfix][CI/Build] Fix CUDA 11.8 Build (vllm-project#9386)

717a5f8

[Bugfix] Molmo text-only input bug fix (vllm-project#9397)

ed92013

Co-authored-by: sanghol <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]>

[Misc] Standardize RoPE handling for Qwen2-VL (vllm-project#9250)

7e7eae3

[Model] VLM2Vec, the first multimodal embedding model in vLLM (vllm-p…

7abba39

…roject#9303)

[CI/Build] Test VLM embeddings (vllm-project#9406)

1de76a0

[Core] Rename input data types (vllm-project#8688)

cee711f

[Misc] Consolidate example usage of OpenAI client for multimodal mode…

59230ef

…ls (vllm-project#9412) Co-authored-by: DarkLight1337 <[email protected]>

[Model] Support SDPA attention for Molmo vision backbone (vllm-projec…

cf1d62a

…t#9410)

Support mistral interleaved attn (vllm-project#9414)

415f76a

[Kernel][Model] Improve continuous batching for Jamba and Mamba (vllm…

fb60ae9

…-project#9189)

[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCP…

5b8a1fd

…M-S-1B-sft (vllm-project#9396)

[Performance][Spec Decode] Optimize ngram lookup performance (vllm-pr…

8345045

…oject#9333)

[CI/Build] mypy: Resolve some errors from checking vllm/engine (vllm-…

776dbd7

…project#9267) Signed-off-by: Russell Bryant <[email protected]>

[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token qu…

c3fab5f

…antize kernel (vllm-project#9425)

[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (vllm-projec…

92d86da

…t#9391)

Add notes on the use of Slack (vllm-project#9442)

dbfa8d3

[Kernel] Add Exllama as a backend for compressed-tensors (vllm-projec…

e312e52

…t#9395)

[Misc] Print stack trace using logger.exception (vllm-project#9461)

390be74

dtrifiro and others added 18 commits November 4, 2024 10:47

make packages explicit

047c72a

libsodium: use -j$(nproc)

00b49a4

Dockerfile.ubi: cleanup flashattention build process

5c2ae9f

get rid of .github dir

305186b

add symlink to libroctx before starting vllm build

655a28c

cleanup space

4838a72

cleanup LD_LIBRARY_PATH usage

f89b422

Sync with [email protected]

3f4e5bf

Merge pull request #225 from dtrifiro/sync-with-0.6.3.post1

39c8fe0

sync with upstream @ v0.6.3.post1

Dockerfile.rocm.ubi: bump ROCm to 6.2.4

460e2b3

Dockerfile.rocm.ubi: bump torch to 20241107 nightly

2ce32ad

Dockerfile.rocm.ubi: reduce image size by getting rid of system rocm

b7d149b

libraries these are included in the python venv at /opt/vllm (torch install) so we can get rid of them.

Dockerfile.rocm.ubi: install libdrm-amdgpu to avoid avoid errors when…

9dfe77a

… retrieving device information This prevents the spamming in the logs of `amdgpu.ids: No such file or directory`

Dockerfile.rocm.ubi: add hsa-rocr-devel dependency

227f802

use PYTHON_VERSION arg where required

3fc45d1

slim down final image

bc65c0e

Dockerfile.rocm.ubi: use BASE_UBI_IMAGE_TAG in AMD repo URL

fbaab94

Dockerfile.rocm.ubi: make sure the venv's torch libraries are available

1024f99

this makes sure that the shared libraries in the python venv are found by the linker.

openshift-ci bot added the do-not-merge/work-in-progress label Nov 13, 2024

openshift-ci bot added the approved label Nov 13, 2024

dtrifiro marked this pull request as ready for review November 13, 2024 10:48

dtrifiro requested a review from njhill as a code owner November 13, 2024 10:48

openshift-ci bot removed the do-not-merge/work-in-progress label Nov 13, 2024

openshift-ci bot requested review from NickLucche and RH-steve-grubb November 13, 2024 10:48

dtrifiro merged commit 9ac4882 into release Nov 13, 2024
0 of 7 checks passed

groenenboomj pushed a commit that referenced this pull request Feb 24, 2025

Added sccache timeout for vllm build (#230)

1ec8aaf

Co-authored-by: maleksan85 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync release with main #230

Sync release with main #230

Uh oh!

dtrifiro commented Nov 13, 2024

Uh oh!

openshift-ci bot commented Nov 13, 2024

Uh oh!

openshift-ci bot commented Nov 13, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants

Sync release with main #230

Sync release with main #230

Uh oh!

Conversation

dtrifiro commented Nov 13, 2024

Uh oh!

openshift-ci bot commented Nov 13, 2024

Uh oh!

openshift-ci bot commented Nov 13, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

28 participants