[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885

CSWYF3634076 · 2025-10-15T06:53:33Z

Purpose

Fix the following issue

Due to SharedFusedMoE forward return is tuple (#26145), it is no flatten() method

(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/myGithub/vllm/vllm/model_executor/models/ernie45_vl_moe.py", line 486, in forward
(EngineCore_DP0 pid=54051)     hidden_states = self.mlp(hidden_states, visual_token_mask, **kwargs)
(EngineCore_DP0 pid=54051)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/py312env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=54051)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=54051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/py312env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=54051)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=54051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/myGithub/vllm/vllm/model_executor/models/ernie45_vl_moe.py", line 358, in forward
(EngineCore_DP0 pid=54051)     ).flatten()
(EngineCore_DP0 pid=54051)       ^^^^^^^
(EngineCore_DP0 pid=54051) AttributeError: 'tuple' object has no attribute 'flatten'

Test Plan

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT --served-model-name ERNIE-45-VL-28B --port 8503  --gpu-memory-utilization 0.95 --trust-remote-code

import base64
import os

import requests
from openai import OpenAI
from urllib.parse import urlparse

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://127.0.0.1:8503/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

def encode_base64_content_from_url(content_url: str) -> str:
    """Encode a content retrieved from a remote url to base64 format."""

    with requests.get(content_url) as response:
        response.raise_for_status()
        result = base64.b64encode(response.content).decode("utf-8")

    return result

def to_base64(content_path: str) -> str:
    """Encode content from a remote URL or local file to base64 format."""
    parsed = urlparse(content_path)
    if parsed.scheme in ("http", "https", "ftp"):
        print(content_path)
        with requests.get(content_path) as response:
            response.raise_for_status()
            data = response.content
    else:
        print(content_path)
        if not os.path.exists(content_path):
            raise FileNotFoundError(f"File not found: {content_path}")
        with open(content_path, "rb") as f:
            data = f.read()

    return base64.b64encode(data).decode("utf-8")

# Single-image input inference
def run_image() -> None:
    
    image_url_1 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
    image_base64_1 = to_base64(image_url_1)
    chat_stream = client.chat.completions.create(
        model="ERNIE-45-VL-28B",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Is the dog on the left or right"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{image_base64_1}"},
                    },
                ],
            }
        ],
        max_completion_tokens=1024,
        temperature=0,
        top_p=1,
        stream=True,
        extra_body={
            "skip_special_tokens": False,
            "chat_template_kwargs":{"enable_thinking": False}
        }
    )


    reasoning_content_list = []
    content_list = []
    for chunk in chat_stream:
        # print(chunk)
        reasoning_content = getattr(chunk.choices[0].delta, "reasoning_content", None)
        content = chunk.choices[0].delta.content
        if reasoning_content:
            print(reasoning_content, end="", flush=True)
            reasoning_content_list.append(reasoning_content)
        if content:
            print(content, end="", flush=True)
            content_list.append(content)


def main(args) -> None:
    run_image()


if __name__ == "__main__":
    # args = parse_args()
    args = ""
    main(args)

Test Result

图中展示了一只坐在沙滩上的狗和一位坐在狗旁边的女性。狗位于图片的左侧，它穿着带有彩色图案的背带，前爪抬起与女性相握，似乎在互动玩耍。女性穿着格子衬衫和深色裤子，坐在狗的右侧，面带微笑，看向狗的方向。背景是广阔的海滩和海洋，海浪轻轻拍打着岸边，天空呈现出柔和的光线，可能是日出或日落时分，整个画面给人一种温馨、宁静的感觉。

English translation is

The picture shows a dog sitting on the beach and a woman sitting next to the dog. The dog is located on the left side of the picture, wearing a colorful patterned harness, with its front paws raised and held by a woman, seemingly interacting and playing. The woman was wearing a checkered shirt and dark pants, sitting on the right side of the dog with a smile on her face, looking in the direction of the dog. The background is a vast beach and ocean, with waves gently crashing against the shore. The sky presents a soft light, perhaps at sunrise or sunset, giving the whole picture a warm and peaceful feeling.

… optimization pr Signed-off-by: wangyafeng <[email protected]>

gemini-code-assist

Code Review

This pull request correctly fixes a crash in the ernie45_vl_moe model that occurred when processing mixed-modality inputs. The original code incorrectly assumed the MoE layer returns a tensor, while it returns a tuple, leading to an AttributeError. The fix correctly unpacks the tuple and handles the outputs from shared and regular experts.

However, I've identified a critical issue with the current implementation. For layers that are MoE for one modality (e.g., vision) but a standard MLP for another (e.g., text), the code will crash. This is because it unconditionally tries to unpack a tuple, but the MLP returns a single tensor. I've provided a suggestion to make the code robust by checking the type of the expert module before processing its output, which will resolve this issue.

vllm/model_executor/models/ernie45_vl_moe.py

CSWYF3634076 · 2025-10-15T07:08:46Z

cc @bnellnm

bnellnm · 2025-10-15T13:00:03Z

vllm/model_executor/models/ernie45_vl_moe.py

            text_token_mask = ~visual_token_mask
-            final_hidden_states = torch.zeros_like(hidden_states)
+            final_experts_hidden_states = torch.zeros_like(hidden_states)
+            final_shard_ouput = (


nit: shard -> shared

bnellnm

Thanks for fixing this!

… optimization pr v2 Signed-off-by: wangyafeng <[email protected]>

CSWYF3634076 · 2025-10-16T04:10:20Z

Thanks for fixing this!

@bnellnm Can you trigger the CI？

DarkLight1337

Sorry for the delay

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

[Model][Bugfix] fix ernie45 vl run failed from shared experts overlap…

769427e

… optimization pr Signed-off-by: wangyafeng <[email protected]>

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

vllm/model_executor/models/ernie45_vl_moe.py Outdated Show resolved Hide resolved

bnellnm reviewed Oct 15, 2025

View reviewed changes

bnellnm approved these changes Oct 15, 2025

View reviewed changes

[Model][Bugfix] fix ernie45 vl run failed from shared experts overlap…

83559c5

… optimization pr v2 Signed-off-by: wangyafeng <[email protected]>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025

DarkLight1337 approved these changes Oct 16, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 16, 2025 07:06

vllm-bot merged commit e519287 into vllm-project:main Oct 16, 2025
53 of 55 checks passed

Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

2e14110

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Oct 17, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

8361dcd

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

32c984a

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

b11ab24

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

1b0643c

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

77fc224

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]> Signed-off-by: 0xrushi <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

2210277

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimiz…

4f97a30

…ation (vllm-project#26885) Signed-off-by: wangyafeng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885

Uh oh!

CSWYF3634076 commented Oct 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

CSWYF3634076 commented Oct 15, 2025

Uh oh!

bnellnm Oct 15, 2025

Uh oh!

CSWYF3634076 Oct 15, 2025

Uh oh!

bnellnm left a comment

Uh oh!

CSWYF3634076 commented Oct 16, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885

[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885

Uh oh!

Conversation

CSWYF3634076 commented Oct 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

CSWYF3634076 commented Oct 15, 2025

Uh oh!

bnellnm Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

CSWYF3634076 Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

bnellnm left a comment

Choose a reason for hiding this comment

Uh oh!

CSWYF3634076 commented Oct 16, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CSWYF3634076 commented Oct 15, 2025 •

edited by github-actions bot

Loading