-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Bugfix] Flush TunableOp results before worker processes are destroyed. #13623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if ((tunable.is_enabled() is True) and | |
| (tunable.tuning_is_enabled() is True) and | |
| (tunable.record_untuned_is_enabled() is False)): | |
| tunable.write_file() | |
| if tunable.is_enabled() and tunable.tuning_is_enabled() and | |
| not tunable.record_untuned_is_enabled(): | |
| tunable.write_file() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (tunable.is_enabled() and tunable.tuning_is_enabled() and | |
| not tunable.record_untuned_is_enabled()): | |
| if tunable.is_enabled() and tunable.tuning_is_enabled() and | |
| not tunable.record_untuned_is_enabled(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I tried without the outermost () and I got a syntax error. Maybe because it is split across two lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it needs \ to break long if statement into multiple lines, or bracket.
|
One more thing, in case it matters, the |
|
Please merge from main and fix the pre-commit errors. |
Can you please put a comment in the code about this? |
78882d0 to
86869af
Compare
Signed-off-by: Nichols A. Romero <[email protected]>
Signed-off-by: Nichols A. Romero <[email protected]>
Signed-off-by: Nichols A. Romero <[email protected]>
86869af to
96be44d
Compare
…d. (vllm-project#13623) Signed-off-by: Nichols A. Romero <[email protected]>
…d. (vllm-project#13623) Signed-off-by: Nichols A. Romero <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…d. (vllm-project#13623) Signed-off-by: Nichols A. Romero <[email protected]>
This PR is a bug fix for users that run TunableOp on multi-GPU vLLM workloads. This is mostly ROCm users.
Currently, TunableOp with worker processes enabled will only write the results for GPU 0 from the main process. Normally, the TunableOp results are flushed to the file system when the TunableOp C++ destructor is called. The worker processes appear to be destroyed before TunableOp destructor is called. What users are experiencing is that the main process ends up with TunableOp results, while the other worker processes that are driving the other GPU end up with an empty TunableOp results file. We now force a flush of the TunableOp results before the worker processes are terminated.
cc: @hongxiayang