-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[BugFix] Fixed 'FlashInferAllToAllManager' object has no attribute 'prepare_workspace' #27862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[BugFix] Fixed 'FlashInferAllToAllManager' object has no attribute 'prepare_workspace' #27862
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two important fixes. The first correctly resolves an AttributeError by changing a method call from prepare_workspace to prepare_workspace_tensor, which is a direct and necessary bug fix. The second change addresses a suspected bug in the FlashInfer library where the topk_weights tensor's data type is incorrectly modified. While the intent to restore the original data type is correct, the implementation uses torch.view(), which is unsafe and likely incorrect. I've provided a critical comment to change this to torch.to() to ensure data correctness.
| top_k, | ||
| ) | ||
| ) | ||
| topk_weights = topk_weights.view(topk_weights_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using torch.view() to change the data type is unsafe and likely incorrect. view() reinterprets the underlying bits of the tensor, which means an int32 tensor's data will be interpreted as garbage float32 values (unless the value is zero).
If the FlashInfer function incorrectly returns a tensor with an int32 dtype, the correct way to restore the original floating-point type is to perform a cast using torch.to(). This will convert the integer values back to floating-point values (e.g., 1 becomes 1.0), which is the safe and intended operation.
This is a critical issue as it could lead to silent correctness problems in the model's computations.
| topk_weights = topk_weights.view(topk_weights_dtype) | |
| topk_weights = topk_weights.to(topk_weights_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view() is required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| top_k, | ||
| ) | ||
| ) | ||
| topk_weights = topk_weights.view(topk_weights_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore topk_weights dtype via invalid Tensor.view call
The new code saves topk_weights.dtype and then calls topk_weights = topk_weights.view(topk_weights_dtype) to restore it. In PyTorch Tensor.view only accepts shape arguments; passing a dtype raises TypeError: 'torch.dtype' object cannot be interpreted as an integer, so any invocation of flashinfer_alltoall_dispatch now crashes before communication. If the intent is to cast back to the previous dtype, use .to(topk_weights_dtype) (or equivalent) instead of view.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensor.view() can accept torch.dtype arguments: https://docs.pytorch.org/docs/stable/generated/torch.Tensor.view.html
e6f8265 to
2b51947
Compare
Signed-off-by: Harry Huang <[email protected]>
Signed-off-by: Harry Huang <[email protected]>
2b51947 to
e5193b0
Compare
|
Hi @mgoin @pavanimajety, This PR is ready for review. Could you please take a look when you have a moment? Thanks! |
Purpose
This PR fixes the bug reported in issue #27655 .
There are two main changes:
Fix AttributeError with
flashinfer_all2allvbackend:When using the
flashinfer_all2allvbackend, an AttributeError: "'FlashInferAllToAllManager' object has no attribute 'prepare_workspace'" is raised. This was caused by an incorrect method call toprepare_workspaceinstead of the correctprepare_workspace_tensorin theFlashInferAllToAllManager. This PR corrects the method name.Retain original dtype for
topk_weights:Additionally, there appears to be a potential bug in FlashInfer where calling
MnnvlMoe.mnnvl_moe_alltoallv_prepare_without_allgather()incorrectly changes the data type oftopk_weightsfromtorch.float32totorch.int32. This is likely related to theprepared_local_scalestensor being allocated astorch.int32within FlashInfer'smoe_prepare()function.As a temporary workaround, this PR saves the original data type of
topk_weightsbefore the function call and restores it immediately after, ensuring the tensor's integrity for subsequent operations.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.