-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils #14379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
cc @youkaichao |
99ff96e to
52ba2e3
Compare
Signed-off-by: Cody Yu <[email protected]>
d15a1a8 to
975b6c8
Compare
|
@youkaichao we should be able to merge this |
…arg_utils (vllm-project#14379) Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
…arg_utils (vllm-project#14379) Signed-off-by: Cody Yu <[email protected]>
The use of
current_platform.get_device_name()inarg_utils.pyis too early and is not compatible with Ray, because Ray Serve may import vLLM at the main actor to use some of its data structures. Since this actor doesn't have GPU and is not the actor that actually runs the engine, we will get an exception about no GPU devices. This PR makes this early device query optional to workaround this issue. It should not affect any existing use cases and should not impact user experience, given that the only difference without querying device inEngineArgs._override_v1_engine_argsisdefault_max_num_batched_tokens.cc @WoosukKwon