[Bugfix] Limit profiling run sequence length by max_model_len #14785

kylesayrs · 2025-03-14T00:05:36Z

Purpose

Fix whisper offline inference example
Under specific conditions (short max_model_len and low max_num_seqs), the sequence length generated for the profiling run can be larger than the max model length

Changes

Add max_model_len cap to enc_dec and standard model runners

Testing

This example used to fail during profiling

python3 examples/offline_inference/audio_language.py --model whisper

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-03-14T00:05:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

examples/offline_inference/audio_language.py

DarkLight1337 · 2025-03-14T03:45:37Z

vllm/worker/model_runner.py

            for group_id in range(max_num_seqs):
                seq_len = (max_num_batched_tokens // max_num_seqs +
                           (group_id < max_num_batched_tokens % max_num_seqs))
+                seq_len = min(seq_len, self.model_config.max_model_len)


Please check other model runners as well

I'm not intimately familiar with the other runners, but I've replicated the cap on xpu

There's also openvino model runner

Signed-off-by: Kyle Sayers <[email protected]>

DarkLight1337 · 2025-03-16T03:50:54Z

LGTM now, thanks for your patience!

…vllm-project#14785)" This reverts commit d30aa7e. Signed-off-by: DarkLight1337 <[email protected]>

…#14785) (#14892) Signed-off-by: DarkLight1337 <[email protected]>

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: DefTruth <[email protected]>

…vllm-project#14785) (vllm-project#14892) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: DefTruth <[email protected]>

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…vllm-project#14785) (vllm-project#14892) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]>

…vllm-project#14785) (vllm-project#14892) Signed-off-by: DarkLight1337 <[email protected]>

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Mu Huai <[email protected]>

…vllm-project#14785) (vllm-project#14892) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>

limit profiling run sequence length by max_model_len

60e7182

Signed-off-by: Kyle Sayers <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Mar 14, 2025

DarkLight1337 reviewed Mar 14, 2025

View reviewed changes

examples/offline_inference/audio_language.py Show resolved Hide resolved

DarkLight1337 reviewed Mar 14, 2025

View reviewed changes

kylesayrs added 3 commits March 14, 2025 14:51

add back explicit model length

943b9ab

Signed-off-by: Kyle Sayers <[email protected]>

cap xpu, add assertion error

2e05b45

Signed-off-by: Kyle Sayers <[email protected]>

cap openvino

d40e9e9

Signed-off-by: Kyle Sayers <[email protected]>

DarkLight1337 approved these changes Mar 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 16, 2025 03:51

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2025

vllm-bot merged commit d30aa7e into vllm-project:main Mar 16, 2025
44 of 47 checks passed

kylesayrs deleted the kylesayrs/cap-seq-len-profile-run branch March 16, 2025 15:55

kylesayrs restored the kylesayrs/cap-seq-len-profile-run branch March 16, 2025 16:04

DarkLight1337 added a commit to DarkLight1337/vllm that referenced this pull request Mar 16, 2025

Revert "[Bugfix] Limit profiling run sequence length by max_model_len (…

105289f

…vllm-project#14785)" This reverts commit d30aa7e. Signed-off-by: DarkLight1337 <[email protected]>

vllm-bot pushed a commit that referenced this pull request Mar 16, 2025

Revert "[Bugfix] Limit profiling run sequence length by max_model_len (…

f6137ad

…#14785) (#14892) Signed-off-by: DarkLight1337 <[email protected]>

DefTruth pushed a commit to DefTruth/vllm that referenced this pull request Mar 17, 2025

[Bugfix] Limit profiling run sequence length by max_model_len (vllm-p…

40e7f44

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: DefTruth <[email protected]>

kylesayrs mentioned this pull request Mar 18, 2025

[Bugfix] Limit max_num_batched_tokens by max_num_seqs * max_model_len #15062

Closed

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Bugfix] Limit profiling run sequence length by max_model_len (vllm-p…

83b9fe5

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Bugfix] Limit profiling run sequence length by max_model_len (vllm-p…

e90d140

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

Revert "[Bugfix] Limit profiling run sequence length by max_model_len (…

f52e5df

…vllm-project#14785) (vllm-project#14892) Signed-off-by: DarkLight1337 <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Bugfix] Limit profiling run sequence length by max_model_len (vllm-p…

8d36589

…roject#14785) Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Limit profiling run sequence length by max_model_len #14785

[Bugfix] Limit profiling run sequence length by max_model_len #14785

Uh oh!

kylesayrs commented Mar 14, 2025 •

edited by DarkLight1337

Loading

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

Uh oh!

DarkLight1337 Mar 14, 2025

Uh oh!

kylesayrs Mar 14, 2025

Uh oh!

DarkLight1337 Mar 15, 2025

Uh oh!

DarkLight1337 commented Mar 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Limit profiling run sequence length by max_model_len #14785

[Bugfix] Limit profiling run sequence length by max_model_len #14785

Uh oh!

Conversation

kylesayrs commented Mar 14, 2025 • edited by DarkLight1337 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

github-actions bot commented Mar 14, 2025

Uh oh!

Uh oh!

DarkLight1337 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

kylesayrs Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 15, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kylesayrs commented Mar 14, 2025 •

edited by DarkLight1337

Loading