[V1] Refactor num_computed_tokens logic #15307

comaniac · 2025-03-21T21:11:02Z

A prerequisite PR to enable the capability of contituning scheduling prefill chunks in PP.

This PR achieves the following flow:

At .schedule(), we advance num_computed_tokens right after scheduling a batch, so that in the case of chunked prefill, we could continue scheduling the next chunk (not covered in this PR).
In the model runner output, we make sure the sampled tokens of prefill requests are empty.
At .update_from_output(), we do not check whether we should append the sampled tokens to requests using num_computed_tokens and num_tokens, but rely on whether model runner provides sampled tokens. Moreover, we do not advance num_computed_tokens in this function. Instead, we only decrease num_computed_tokens when spec tokens get rejected.

github-actions · 2025-03-21T21:11:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

WoosukKwon

Thanks for doing this! Left some comments.

vllm/v1/core/sched/scheduler.py

vllm/v1/worker/gpu_model_runner.py

tests/v1/core/test_scheduler.py

tests/v1/engine/test_engine_core.py

vllm/v1/core/sched/scheduler.py

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Cody Yu <[email protected]>

Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Cody Yu <[email protected]>

Signed-off-by: Cody Yu <[email protected]>

WoosukKwon

LGTM. Thanks for the PR!

WoosukKwon · 2025-03-26T19:11:41Z

@comaniac Please check the CI failures. It seems related.

njhill · 2025-03-26T19:12:50Z

I will do a quick review now too...

njhill

Thanks @comaniac, I left a few comments too

vllm/v1/core/sched/scheduler.py

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Cody Yu <[email protected]>

njhill

Thanks @comaniac yep this looks good to me!

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: Cody Yu <[email protected]>

njhill

Thanks @comaniac!

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>

comaniac requested review from WoosukKwon, alexm-redhat, njhill, robertgshaw2-redhat and ywang96 as code owners March 21, 2025 21:11

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 21, 2025

mergify bot added the v1 label Mar 21, 2025

comaniac changed the title ~~[WIP][V1][ModelRunner] Do not output sampled tokens for prefill requests~~ [WIP][V1] Refactor num_computed_tokens logic Mar 21, 2025

comaniac changed the title ~~[WIP][V1] Refactor num_computed_tokens logic~~ [V1] Refactor num_computed_tokens logic Mar 21, 2025

WoosukKwon reviewed Mar 25, 2025

View reviewed changes

comaniac force-pushed the num-computed-tokens branch from 0a92097 to 56208c6 Compare March 25, 2025 22:52

WoosukKwon reviewed Mar 26, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

comaniac and others added 6 commits March 26, 2025 11:20

done

dcfb776

Signed-off-by: Cody Yu <[email protected]>

add

ca596a8

Signed-off-by: Cody Yu <[email protected]>

fix test

60feb71

Signed-off-by: Cody Yu <[email protected]>

Update vllm/v1/core/sched/scheduler.py

db51416

Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Cody Yu <[email protected]>

comment

d4e5b7a

Signed-off-by: Cody Yu <[email protected]>

fix

a426db0

Signed-off-by: Cody Yu <[email protected]>

comaniac force-pushed the num-computed-tokens branch from 22e14ab to a426db0 Compare March 26, 2025 18:20

WoosukKwon approved these changes Mar 26, 2025

View reviewed changes

njhill reviewed Mar 26, 2025

View reviewed changes

comaniac mentioned this pull request Mar 26, 2025

[V1] Support long_prefill_token_threshold in v1 scheduler #15419

Merged

comaniac added 2 commits March 26, 2025 14:36

fix

62f1cb0

Signed-off-by: Cody Yu <[email protected]>

refactor

f5ea724

Signed-off-by: Cody Yu <[email protected]>

njhill reviewed Mar 26, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

comments

2c96ed6

Signed-off-by: Cody Yu <[email protected]>

comaniac enabled auto-merge (squash) March 27, 2025 00:18

njhill approved these changes Mar 27, 2025

View reviewed changes

comaniac merged commit 54aa619 into vllm-project:main Mar 27, 2025
33 checks passed

MichoChan mentioned this pull request Apr 6, 2025

[Bug]: assert request.num_computed_tokens <= request.num_tokens #14915

Closed

1 task

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] Refactor num_computed_tokens logic (vllm-project#15307)

8d977af

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[V1] Refactor num_computed_tokens logic (vllm-project#15307)

5efe7d0

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1] Refactor num_computed_tokens logic (vllm-project#15307)

2ba475a

Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>

Uh oh!

[V1] Refactor num_computed_tokens logic #15307

[V1] Refactor num_computed_tokens logic #15307

Uh oh!

Conversation

comaniac commented Mar 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Mar 26, 2025

Uh oh!

njhill commented Mar 26, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

comaniac commented Mar 21, 2025 •

edited by github-actions bot

Loading