-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[BugFix][P/D] Fix for cases where _recving_transfers can be cleaned up when *all* transfer done #19874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @lk-chen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request contains a critical bug fix for the distributed key-value transfer mechanism, specifically refining how completed transfers are identified and cleaned up. The change ensures that a group of related transfers is only considered fully complete and removed from active tracking when all its individual components have successfully finished, thereby improving the robustness and correctness of the distributed system's state management.
Highlights
- Bug Fix: Distributed KV Transfer Cleanup: I've addressed a bug in the
_pop_done_transfersfunction withinnixl_connector.py. Previously, a request's associated transfers (_recving_transfers) could be prematurely cleaned up if only one of its multiple transfer handles completed. The fix ensures that a request ID is only marked as fully done and removed from tracking when all individual transfer handles belonging to that request have successfully reached a 'DONE' state. - Improved Transfer State Management: The logic now collects the states of all transfer handles for a given request ID before deciding to remove the request from the active transfers list. This prevents potential issues where subsequent operations might expect an ongoing transfer to still be tracked, only to find it already removed.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical bug in _pop_done_transfers where a request could be prematurely marked as done and removed from tracking if any single one of its associated transfers completed, rather than waiting for all transfers for that request to complete. The change correctly introduces logic to collect the status of all transfers for a given request and only considers the request done if all its transfers are in the "DONE" state. This significantly improves the robustness of transfer management.
It's also noted that the PR description is currently a template and should be filled out to provide context for the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core logic change correctly ensures that a request ID is removed from the transfers dictionary and added to done_req_ids only when all associated transfer handles have reached the "DONE" state. This fixes a potential issue where requests might have been considered complete prematurely.
Additionally, this change introduces a new behavior for request IDs that are present in transfers but have an empty list of handles:
- Previously, such requests would not be processed by the inner loop and thus not removed from
transfers. - With the new logic,
xfer_statswill be an empty list, andall(s == "DONE" for s in [])evaluates toTrue. Consequently, these requests will now be correctly identified as done and cleaned up.
This appears to be a more robust handling of such edge cases, as a request with no pending transfers should indeed be considered complete.
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lk-chen
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lk-chen lgtm, just linting errors to fix
|
This PR should be mergable @njhill |
|
@lk-chen could you rebase, it should fix the v1-test failure at least. |
Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: Linkun Chen <[email protected]>
3571837 to
88feb0a
Compare
Signed-off-by: Linkun Chen <[email protected]>
88feb0a to
ef3e8ea
Compare
|
@njhill v1-test passed https://buildkite.com/vllm/ci/builds/22459 |
…p when *all* transfer done (vllm-project#19874) Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: juncheoll <[email protected]>
…p when *all* transfer done (vllm-project#19874) Signed-off-by: Linkun Chen <[email protected]> Signed-off-by: fhl <[email protected]>
|
Hey @lk-chen , just to understand current state better, do you have a setup where you are able to produce multiple xfers per request? |
|
Hi @NickLucche I don't really know a stable setup that can reproduce it in real run, but only mimic it in unit test. The data struct (mapping from id to all xfer handles) implies this could potentially happen anyway. |
|
The point is that when a request arrives it is assigned a single remote for prefilling (or at least this is the assumption) and a single D for decoding, which will read all blocks from that P remote in the form of remote_block_ids->local_block_ids.
I believe this is a cruft from the initial porting work. |
|
Thank you @NickLucche , I see your point, but at the moment I honestly don't have more insights how this happened |
|
@NickLucche we can revert this but should also change the handles to be a single value rather than a list, right? |
Yep I can put up a PR tomorrow unless you guys have bw before then. I wouldn't revert all of this PR though, the tests are very much needed. |
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
self. _recving_transfersis mapping from request_id to list of all ongoing transfers. We should not evict request id until all transfers are done.Test Plan
unit test
Test Result
(Optional) Documentation Update