Pass chunk size to moe op #2264

yiliu30 · 2025-08-15T05:24:49Z

Requires HabanaAI/vllm-hpu-extension#337

Signed-off-by: yiliu30 <[email protected]>

Copilot

Pull Request Overview

This PR modifies the FP8 quantization MoE (Mixture of Experts) forward pass to support passing chunk size information to the underlying MoE operation. The change enables dynamic chunk size configuration by extracting tokens_num from hidden_states and delegating to the original module for additional kwargs.

Adds a helper method to extract extra kwargs from the original module
Modifies forward_quant to pass chunk size information via extra_kwargs

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

pass chunk size to moe op

5c6c55c

Signed-off-by: yiliu30 <[email protected]>

yiliu30 requested a review from Copilot August 16, 2025 05:55

Copilot AI reviewed Aug 16, 2025

View reviewed changes

neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py Show resolved Hide resolved

yiliu30 merged commit 7b7e01e into aice/v122 Aug 22, 2025
2 checks passed

yiliu30 deleted the moe-chunk-size branch August 22, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pass chunk size to moe op #2264

Pass chunk size to moe op #2264

Uh oh!

yiliu30 commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Pass chunk size to moe op #2264

Pass chunk size to moe op #2264

Uh oh!

Conversation

yiliu30 commented Aug 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!