Skip to content

Conversation

@CSWYF3634076
Copy link
Contributor

@CSWYF3634076 CSWYF3634076 commented Oct 15, 2025

Purpose

Fix the following issue

Due to SharedFusedMoE forward return is tuple (#26145), it is no flatten() method

(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/myGithub/vllm/vllm/model_executor/models/ernie45_vl_moe.py", line 486, in forward
(EngineCore_DP0 pid=54051)     hidden_states = self.mlp(hidden_states, visual_token_mask, **kwargs)
(EngineCore_DP0 pid=54051)                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/py312env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
(EngineCore_DP0 pid=54051)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=54051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/py312env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
(EngineCore_DP0 pid=54051)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=54051)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=54051)   File "/root/paddlejob/wangyafeng/myGithub/vllm/vllm/model_executor/models/ernie45_vl_moe.py", line 358, in forward
(EngineCore_DP0 pid=54051)     ).flatten()
(EngineCore_DP0 pid=54051)       ^^^^^^^
(EngineCore_DP0 pid=54051) AttributeError: 'tuple' object has no attribute 'flatten'

Test Plan

vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT --served-model-name ERNIE-45-VL-28B --port 8503  --gpu-memory-utilization 0.95 --trust-remote-code
import base64
import os

import requests
from openai import OpenAI
from urllib.parse import urlparse

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://127.0.0.1:8503/v1"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key=openai_api_key,
    base_url=openai_api_base,
)

def encode_base64_content_from_url(content_url: str) -> str:
    """Encode a content retrieved from a remote url to base64 format."""

    with requests.get(content_url) as response:
        response.raise_for_status()
        result = base64.b64encode(response.content).decode("utf-8")

    return result

def to_base64(content_path: str) -> str:
    """Encode content from a remote URL or local file to base64 format."""
    parsed = urlparse(content_path)
    if parsed.scheme in ("http", "https", "ftp"):
        print(content_path)
        with requests.get(content_path) as response:
            response.raise_for_status()
            data = response.content
    else:
        print(content_path)
        if not os.path.exists(content_path):
            raise FileNotFoundError(f"File not found: {content_path}")
        with open(content_path, "rb") as f:
            data = f.read()

    return base64.b64encode(data).decode("utf-8")

# Single-image input inference
def run_image() -> None:
    
    image_url_1 = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
    image_base64_1 = to_base64(image_url_1)
    chat_stream = client.chat.completions.create(
        model="ERNIE-45-VL-28B",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Is the dog on the left or right"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{image_base64_1}"},
                    },
                ],
            }
        ],
        max_completion_tokens=1024,
        temperature=0,
        top_p=1,
        stream=True,
        extra_body={
            "skip_special_tokens": False,
            "chat_template_kwargs":{"enable_thinking": False}
        }
    )


    reasoning_content_list = []
    content_list = []
    for chunk in chat_stream:
        # print(chunk)
        reasoning_content = getattr(chunk.choices[0].delta, "reasoning_content", None)
        content = chunk.choices[0].delta.content
        if reasoning_content:
            print(reasoning_content, end="", flush=True)
            reasoning_content_list.append(reasoning_content)
        if content:
            print(content, end="", flush=True)
            content_list.append(content)


def main(args) -> None:
    run_image()


if __name__ == "__main__":
    # args = parse_args()
    args = ""
    main(args)

Test Result

图中展示了一只坐在沙滩上的狗和一位坐在狗旁边的女性。狗位于图片的左侧,它穿着带有彩色图案的背带,前爪抬起与女性相握,似乎在互动玩耍。女性穿着格子衬衫和深色裤子,坐在狗的右侧,面带微笑,看向狗的方向。背景是广阔的海滩和海洋,海浪轻轻拍打着岸边,天空呈现出柔和的光线,可能是日出或日落时分,整个画面给人一种温馨、宁静的感觉。

English translation is

The picture shows a dog sitting on the beach and a woman sitting next to the dog. The dog is located on the left side of the picture, wearing a colorful patterned harness, with its front paws raised and held by a woman, seemingly interacting and playing. The woman was wearing a checkered shirt and dark pants, sitting on the right side of the dog with a smile on her face, looking in the direction of the dog. The background is a vast beach and ocean, with waves gently crashing against the shore. The sky presents a soft light, perhaps at sunrise or sunset, giving the whole picture a warm and peaceful feeling.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a crash in the ernie45_vl_moe model that occurred when processing mixed-modality inputs. The original code incorrectly assumed the MoE layer returns a tensor, while it returns a tuple, leading to an AttributeError. The fix correctly unpacks the tuple and handles the outputs from shared and regular experts.

However, I've identified a critical issue with the current implementation. For layers that are MoE for one modality (e.g., vision) but a standard MLP for another (e.g., text), the code will crash. This is because it unconditionally tries to unpack a tuple, but the MLP returns a single tensor. I've provided a suggestion to make the code robust by checking the type of the expert module before processing its output, which will resolve this issue.

@CSWYF3634076
Copy link
Contributor Author

cc @bnellnm

text_token_mask = ~visual_token_mask
final_hidden_states = torch.zeros_like(hidden_states)
final_experts_hidden_states = torch.zeros_like(hidden_states)
final_shard_ouput = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shard -> shared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@bnellnm bnellnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@CSWYF3634076
Copy link
Contributor Author

Thanks for fixing this!

@bnellnm Can you trigger the CI?

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 16, 2025 07:06
@vllm-bot vllm-bot merged commit e519287 into vllm-project:main Oct 16, 2025
53 of 55 checks passed
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025
Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Oct 17, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants