[Model] Apply shared experts overlap optimization to all models with shared experts #26145

bnellnm · 2025-10-02T23:37:17Z

Purpose

Use SharedFusedMoE in all models that use shared experts. This will enable the shared experts/communication overlap optimization for all the changed models.
Move SharedFusedMoE class to fused_moe directory.
Update SharedFusedMoE to behave like FusedMoE when the shared_experts are None
Disable shared expert overlap if EP is disabled or we are not using flashinfer + DP since there is nothing to be gained in this case and it prevents the shared experts from being hidden from torch.compile.

For most models the changes consist of renaming FusedMoE -> SharedFusedMoE and passing the shared experts module as a parameter to SharedFusedMoE. A few models required extra tweaks: aria, ernie45_vl_moe and the qwen models.

Test Plan

Test all modified models.
Note: all model types appear to be covered by tests/models/registry.py

Test Result

TBD

Signed-off-by: Bill Nell <[email protected]>

vllm/model_executor/models/qwen3_next.py

Signed-off-by: Bill Nell <[email protected]>

mgoin

LGTM, nice refactor to get thank you

Signed-off-by: Bill Nell <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: yang926 <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]>

### What this PR does / why we need it? This is the step 1 of refactoring code to adapt with vllm main, and this pr aligned with vllm-project/vllm@17c540a 1. refactor deepseek to the latest code arch as of vllm-project/vllm@17c540a 2. bunches of fixes due to vllm changes - Fix `AscendScheduler` `__post_init__`, caused by vllm-project/vllm#25075 - Fix `AscendScheduler` init got an unexpected arg `block_size`, caused by vllm-project/vllm#26296 - Fix `KVCacheManager` `get_num_common_prefix_blocks` arg, caused by vllm-project/vllm#23485 - Fix `MLAAttention` import,caused by vllm-project/vllm#25103 - Fix `SharedFusedMoE` import, caused by vllm-project/vllm#26145 - Fix `LazyLoader` improt, caused by vllm-project/vllm#27022 - Fix `vllm.utils.swap_dict_values` improt, caused by vllm-project/vllm#26990 - Fix `Backend` enum import, caused by vllm-project/vllm#25893 - Fix `CompilationLevel` renaming to `CompilationMode` issue introduced by vllm-project/vllm#26355 - Fix fused_moe ops, caused by vllm-project/vllm#24097 - Fix bert model because of `inputs_embeds`, caused by vllm-project/vllm#25922 - Fix MRope because of `get_input_positions_tensor` to `get_mrope_input_positions`, caused by vllm-project/vllm#24172 - Fix `splitting_ops` changes introduced by vllm-project/vllm#25845 - Fix multi-modality changes introduced by vllm-project/vllm#16229 - Fix lora bias dropping issue introduced by vllm-project/vllm#25807 - Fix structured ouput break introduced by vllm-project/vllm#26737 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? CI passed with existing test. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: MengqingCao <[email protected]> Signed-off-by: Icey <[email protected]> Co-authored-by: Icey <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]>

mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models llama Related to Llama models labels Oct 2, 2025

bnellnm marked this pull request as ready for review October 3, 2025 03:15

bnellnm requested review from mgoin, robertgshaw2-redhat, sighingnow, tlrmchlsmth and yewentao256 as code owners October 3, 2025 03:15

[Models] Apply SharedFusedMoE to all models with shared experts

b81bcd1

Signed-off-by: Bill Nell <[email protected]>

bnellnm force-pushed the shared-moe branch from bdd72d7 to b81bcd1 Compare October 6, 2025 15:17

bnellnm added 3 commits October 6, 2025 15:39

fix formatting nonsense

0e42864

Signed-off-by: Bill Nell <[email protected]>

more formatting nonsense

4a1a2a8

Signed-off-by: Bill Nell <[email protected]>

more formatting nonsense

b20980c

Signed-off-by: Bill Nell <[email protected]>

bnellnm changed the title ~~[Model] Use SharedFusedMoE in all models with shared experts~~ [Model] Apply shared overlap optimization to all models with shared experts Oct 7, 2025

bnellnm changed the title ~~[Model] Apply shared overlap optimization to all models with shared experts~~ [Model] Apply shared experts overlap optimization to all models with shared experts Oct 7, 2025

hmellor reviewed Oct 7, 2025

View reviewed changes

vllm/model_executor/models/qwen3_next.py Outdated Show resolved Hide resolved

get rid of yapf comment

275d525

Signed-off-by: Bill Nell <[email protected]>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 8, 2025

mgoin approved these changes Oct 8, 2025

View reviewed changes

bnellnm added 2 commits October 9, 2025 13:06

fix tests

6d474d6

Signed-off-by: Bill Nell <[email protected]>

fix tests

663389f

Signed-off-by: Bill Nell <[email protected]>

mgoin merged commit 47e66c2 into vllm-project:main Oct 9, 2025
62 checks passed

CSWYF3634076 mentioned this pull request Oct 15, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885

Merged

This was referenced Oct 16, 2025

[CI] Upgrade vllm to newest commit vllm-project/vllm-ascend#3423

Closed

[CI] Upgrade vllm to 0.11.1 vllm-project/vllm-ascend#3499

Closed

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Model] Apply shared experts overlap optimization to all models with …

c516d78

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]>

MengqingCao mentioned this pull request Oct 22, 2025

[1/N][Refactor] Refactor code to adapt with vllm main vllm-project/vllm-ascend#3612

Merged

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Model] Apply shared experts overlap optimization to all models with …

ecb983d

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model] Apply shared experts overlap optimization to all models with …

7a85d54

…shared experts (vllm-project#26145) Signed-off-by: Bill Nell <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] Apply shared experts overlap optimization to all models with shared experts #26145

[Model] Apply shared experts overlap optimization to all models with shared experts #26145

Uh oh!

bnellnm commented Oct 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Model] Apply shared experts overlap optimization to all models with shared experts #26145

[Model] Apply shared experts overlap optimization to all models with shared experts #26145

Uh oh!

Conversation

bnellnm commented Oct 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bnellnm commented Oct 2, 2025 •

edited by github-actions bot

Loading