-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[V1] Refactor num_computed_tokens logic #15307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
WoosukKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! Left some comments.
0a92097 to
56208c6
Compare
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
22e14ab to
a426db0
Compare
WoosukKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the PR!
|
@comaniac Please check the CI failures. It seems related. |
|
I will do a quick review now too... |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comaniac, I left a few comments too
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comaniac yep this looks good to me!
Signed-off-by: Cody Yu <[email protected]>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comaniac!
Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: xinyuxiao <[email protected]>
Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>
Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]>
Signed-off-by: Cody Yu <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Signed-off-by: Mu Huai <[email protected]>
A prerequisite PR to enable the capability of contituning scheduling prefill chunks in PP.
This PR achieves the following flow:
.schedule(), we advancenum_computed_tokensright after scheduling a batch, so that in the case of chunked prefill, we could continue scheduling the next chunk (not covered in this PR)..update_from_output(), we do not check whether we should append the sampled tokens to requests usingnum_computed_tokensandnum_tokens, but rely on whether model runner provides sampled tokens. Moreover, we do not advancenum_computed_tokensin this function. Instead, we only decreasenum_computed_tokenswhen spec tokens get rejected.