[BugFix] Illegal memory access for MoE On H20 #13693

Abatom · 2025-02-22T03:20:15Z

When we attempted to deploy DeepSeek R1 671B on two 8-card H20 machines, vLLM crashed and reported illegal memory access whenever the prompt length exceeded 32K. This PR fixes the bug.

github-actions · 2025-02-22T03:20:27Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Abatom <[email protected]> Co-authored-by: Apache9 <[email protected]> Co-authored-by: kirbyzhou <[email protected]> Co-authored-by: ch-tiger <[email protected]>

Apache9 · 2025-02-22T08:02:59Z

Since the problem is that we slice with a smaller cache size but actually the latter operators may write beyond the cache limit, it does not always crash the program.

We also tested on two H800 machines, there was no problem. But I guess it may effect the quality of output tokens.

And we also tested sglang which has the same problem (see sgl-project/sglang#3779) on a single 8 GPUs machine where each GPU has 140GB+ memory, the only problem was that it would be extremely slow when the input prompt was longer than 32K...

mgoin

Thank you for the bug report and patch. I could not reproduce on H200, but the current oversight makes sense

benchislett · 2025-02-24T15:32:50Z

To reproduce on H200, try sending more than one concurrent request of length 50k+. I have seen this issue with DeepSeek-R1 on 8xH200, and it seems to be resolved with this diff.

Great catch!

Signed-off-by: Russell Bryant <[email protected]>

DefTruth · 2025-02-26T05:41:33Z

same error for 32K+ context len, seems this pr can fix my problem.

Louis-Zhu · 2025-03-03T13:05:38Z

After applying this PR, we still meet this error with 80k characters. We deploy a DeepSeek R1 on 3 8*H20 machine, with 128k max_model_len

Apache9 · 2025-03-03T14:01:28Z

After applying this PR, we still meet this error with 80k characters. We deploy a DeepSeek R1 on 3 8*H20 machine, with 128k max_model_len

Could you please provide more detailed information about the crash?

We still saw illegal memory access after applying this PR too, and it happened in cublas, finally it turned out that our CUDA version and linux toolkit driver was not compatibile, after upgrading the toolkit driver the error disappeared.

Thanks.

cheferrari · 2025-03-03T14:01:52Z

hi, we need this PR to fix the same issue that deepseek-r1 on H20，When will the version containing this PR be released? Is there a timeline?

Louis-Zhu · 2025-03-03T14:13:11Z

After applying this PR, we still meet this error with 80k characters. We deploy a DeepSeek R1 on 3 8*H20 machine, with 128k max_model_len

Could you please provide more detailed information about the crash?

We still saw illegal memory access after applying this PR too, and it happened in cublas, finally it turned out that our CUDA version and linux toolkit driver was not compatibile, after upgrading the toolkit driver the error disappeared.

Thanks.

we build a docker image on ray 2.40.0 base image, with main branch of vllm installed, and start a ray cluster with 3 8*H20 nodes, cuda version V12.4.131 and nvidia driver version 560.35.03.

Apache9 · 2025-03-03T14:27:57Z

After applying this PR, we still meet this error with 80k characters. We deploy a DeepSeek R1 on 3 8*H20 machine, with 128k max_model_len

Could you please provide more detailed information about the crash?
We still saw illegal memory access after applying this PR too, and it happened in cublas, finally it turned out that our CUDA version and linux toolkit driver was not compatibile, after upgrading the toolkit driver the error disappeared.
Thanks.

we build a docker image on ray 2.40.0 base image, with main branch of vllm installed, and start a ray cluster with 3 8*H20 nodes, cuda version V12.4.131 and nvidia driver version 560.35.03.

The driver version seems fine...

Then maybe there are still other bugs on your setup way...

joydchh · 2025-03-09T02:10:46Z

Since the problem is that we slice with a smaller cache size but actually the latter operators may write beyond the cache limit, it does not always crash the program.

We also tested on two H800 machines, there was no problem. But I guess it may effect the quality of output tokens.

And we also tested sglang which has the same problem (see sgl-project/sglang#3779) on a single 8 GPUs machine where each GPU has 140GB+ memory, the only problem was that it would be extremely slow when the input prompt was longer than 32K...

Do you have some insights on the slow decoding? We tested on H200, when prompt is around 8k, the decoding speed is just 4.x tokens/s.

Signed-off-by: Louis Ulmer <[email protected]>

fix illegle memory access

7d5dc9f

Signed-off-by: Abatom <[email protected]> Co-authored-by: Apache9 <[email protected]> Co-authored-by: kirbyzhou <[email protected]> Co-authored-by: ch-tiger <[email protected]>

Abatom force-pushed the moe branch from 904ca86 to 7d5dc9f Compare February 22, 2025 04:18

jeejeelee requested a review from LucasWilkinson February 22, 2025 06:28

mgoin approved these changes Feb 24, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2025

mgoin enabled auto-merge (squash) February 24, 2025 14:34

simon-mo merged commit ccc0051 into vllm-project:main Feb 24, 2025
40 of 47 checks passed

mgoin mentioned this pull request Feb 24, 2025

Fix precommit fail in fused_moe intermediate_cache2 chunking #13772

Merged

russellb added a commit to russellb/vllm that referenced this pull request Feb 24, 2025

Fix a pre-commit error that snuck into main via vllm-project#13693

5549bcc

Signed-off-by: Russell Bryant <[email protected]>

jeejeelee mentioned this pull request Feb 25, 2025

[Bug]: vllm0.7.3: an illegal memory access was encountered #13824

Closed

1 task

hmellor mentioned this pull request Feb 27, 2025

[Bug]: Deepseek R1 MI300A Memory access fault #12773

Closed

1 task

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[BugFix] Illegal memory access for MoE On H20 (vllm-project#13693)

140913e

ggcr mentioned this pull request Mar 5, 2025

[Bug]: (v0.7.2): RuntimeError: CUDA error: an illegal memory access was encountered #13939

Closed

1 task

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[BugFix] Illegal memory access for MoE On H20 (vllm-project#13693)

83d88e2

Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

acelyc111 mentioned this pull request Apr 24, 2025

[Bug]: [0.7.2+] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered #17098

Closed

1 task

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[BugFix] Illegal memory access for MoE On H20 (vllm-project#13693)

34c055a

Abatom deleted the moe branch June 25, 2025 02:18

Uh oh!

[BugFix] Illegal memory access for MoE On H20 #13693

[BugFix] Illegal memory access for MoE On H20 #13693

Uh oh!

Conversation

Abatom commented Feb 22, 2025

Uh oh!

github-actions bot commented Feb 22, 2025

Uh oh!

Apache9 commented Feb 22, 2025

Uh oh!

mgoin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benchislett commented Feb 24, 2025

Uh oh!

Uh oh!

DefTruth commented Feb 26, 2025

Uh oh!

Louis-Zhu commented Mar 3, 2025

Uh oh!

Apache9 commented Mar 3, 2025

Uh oh!

cheferrari commented Mar 3, 2025

Uh oh!

Louis-Zhu commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Apache9 commented Mar 3, 2025

Uh oh!

joydchh commented Mar 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

mgoin left a comment •

edited

Loading

Louis-Zhu commented Mar 3, 2025 •

edited

Loading