[EPLB][ROCm]: support EPBL for ROCm backend #27731

PerryZhang01 · 2025-10-29T09:00:48Z

Purpose

This PR supports EPLB for the ROCm backend, achieving feature parity with the existing CUDA implementation. The implementation validated on DeepSeekR1.

Test Plan

we try to enable EPLB on DeepSeekR1 on MI355 with the following parameters.

server:
vllm serve $model_path \
--tensor-parallel-size 8 \
--max-num-batched-tokens 32768 \
--trust-remote-code \
--no-enable-prefix-caching \
--disable-log-requests \
--compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
--gpu_memory_utilization 0.8 \
--block-size 1 \
--enable-expert-parallel \
--enable-eplb \
--num-redundant-experts 8 \
--eplb-log-balancedness \
--eplb-window-size 3000 \
--eplb-step-interval 1000

client:
python -m vllm.entrypoints.cli.main bench serve \
    --host localhost \
    --port 8000 \
    --model ${model_path} \
    --dataset-name random \
    --random-input-len 1024 \
    --random-output-len 1024 \
    --max-concurrency 64 \
    --num-prompts 128 \
    --seed 123 \
    --percentile-metrics ttft,tpot,itl,e2el \
    --ignore-eos

Test Result

Benchmark Result:

After using eplb, the balancedness metric increased from 0.55 to 0.65 with the random data. However, the avg_tokens is not enough in decode phase, so that the balancedness metric dropped to 0.3.

Besides, we use lm_eval to validate the accuracy of EPLB on gsm8k datasets, the result as below:

    |Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
    |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
    |gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9492|±  |0.0060|
    |     |       |strict-match    |     5|exact_match|↑  |0.9492|±  |0.0060|

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/layers/fused_moe/layer.py

vllm/config/parallel.py

gemini-code-assist

Code Review

This pull request adds support for Expert Parallelism Load Balancing (EPLB) on the ROCm backend, achieving feature parity with the CUDA implementation. The changes are logical and well-contained, primarily enabling the feature for ROCm and updating the relevant checks and method calls. I have one point of feedback regarding the removal of a tensor contiguity assertion, which could potentially lead to issues with distributed communication. I've suggested ensuring tensor contiguity to maintain correctness and performance.

vllm/model_executor/layers/fused_moe/layer.py

PerryZhang01 · 2025-10-30T03:13:08Z

@abmfy could u please help review this PR

vllm/model_executor/layers/fused_moe/layer.py

tjtanaa

LGTM

hmellor

Looks much cleaner now, just one thing I'm not sure about

hmellor · 2025-10-31T10:05:42Z

vllm/model_executor/layers/fused_moe/layer.py

+        assert all(
+            weight.is_contiguous()
+            for name, weight in weights
+            if not name.startswith("_shared_experts.")
+        )


I'm not sure about this change. @abmfy could this cause issues for other EPLB use cases?

the shared expert is currently not continuous, possibly because the later MR performed a stride operation, however, we think that EPLB only apply to routed experts(this function only returns routed experts), so we have canceled the contiguous check on shared expert, and shared expert may have other stride operations in the future and should not be asserted here.

@hmellor @abmfy can someone confirm the change here?

I have rebased the latest code and validated the accuracy of EPLB on gsm8k datasets again.

@hmellor should be good here since EPLB is only on shared experts

mergify · 2025-11-06T07:59:57Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @PerryZhang01.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Perry Zhang <[email protected]>

PerryZhang01 · 2025-11-10T07:23:59Z

@hmellor @abmfy @BowenBao could u help review this PR?

abmfy

LGTM.
Thanks for the contribution!

abmfy · 2025-11-11T01:28:39Z

vllm/model_executor/layers/fused_moe/layer.py

+        assert all(
+            weight.is_contiguous()
+            for name, weight in weights
+            if not name.startswith("_shared_experts.")
+        )


@hmellor should be good here since EPLB is only on shared experts

PerryZhang01 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 29, 2025 09:00

mergify bot added the rocm Related to AMD ROCm label Oct 29, 2025

chatgpt-codex-connector bot reviewed Oct 29, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

ganyi1996ppo reviewed Oct 29, 2025

View reviewed changes

vllm/config/parallel.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Oct 29, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Show resolved Hide resolved

ganyi1996ppo reviewed Oct 29, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

PerryZhang01 mentioned this pull request Oct 29, 2025

[Performance]: Deepseek-V3 Performance Uplift Plan on ROCm Backend #26768

Open

30 tasks

hmellor requested changes Oct 29, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

PerryZhang01 force-pushed the eplb_rocm branch from 30ca39a to b45acb3 Compare October 30, 2025 08:03

hmellor reviewed Oct 30, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

PerryZhang01 force-pushed the eplb_rocm branch from cf600fb to 41b7f56 Compare October 31, 2025 02:14

tjtanaa approved these changes Oct 31, 2025

View reviewed changes

hmellor reviewed Oct 31, 2025

View reviewed changes

mergify bot added the needs-rebase label Nov 6, 2025

zgplvyou added 3 commits November 6, 2025 08:10

[EPLB][ROCm]: support EPBL for ROCm backend

51eb8c1

Signed-off-by: Perry Zhang <[email protected]>

[EPLB](fix): fix assert error for shared experts

1438a77

Signed-off-by: Perry Zhang <[email protected]>

[EPLB][fix]: reuse weight filter for other models

0101f5f

Signed-off-by: Perry Zhang <[email protected]>

[EPLB][typo]: modify import method for pre-commit format

5f7b78f

Signed-off-by: Perry Zhang <[email protected]>

PerryZhang01 force-pushed the eplb_rocm branch from 41b7f56 to 19df811 Compare November 6, 2025 09:11

mergify bot removed the needs-rebase label Nov 6, 2025

[EPLB][fix]: compatible with the latest code

cabe8ba

Signed-off-by: Perry Zhang <[email protected]>

PerryZhang01 force-pushed the eplb_rocm branch from 19df811 to cabe8ba Compare November 6, 2025 11:03

abmfy approved these changes Nov 11, 2025

View reviewed changes

Uh oh!

[EPLB][ROCm]: support EPBL for ROCm backend #27731

Are you sure you want to change the base?

[EPLB][ROCm]: support EPBL for ROCm backend #27731

Conversation

PerryZhang01 commented Oct 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PerryZhang01 commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

PerryZhang01 Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

PerryZhang01 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

PerryZhang01 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abmfy Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 6, 2025

Uh oh!

PerryZhang01 commented Nov 10, 2025

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

abmfy Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

PerryZhang01 commented Oct 29, 2025 •

edited by github-actions bot

Loading

PerryZhang01 Nov 7, 2025 •

edited

Loading