Skip to content

Conversation

@zhewenl
Copy link
Collaborator

@zhewenl zhewenl commented Oct 31, 2025

Purpose

More details in #27619.

EAGLE speculative decoding is failing on AMD GPUs with a HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION error. The error occurs immediately when processing prompts starts, after successful CUDA graph capturing.
It can be reproduced with spec decode + EAGLE + Triton Attention Backend(default for AMD), example:

python3 offline_inference/spec_decode.py --test --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048

Error Details

:0:rocdevice.cpp:3675: Callback: Queue 0x7f4db0300000 aborting with error:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29

(using a different backend could work: eg. ROCM_AITER_FA, ROCM_AITER_UNIFIED_ATTN, but TritonAttentionBackend is the default attention backend for AMD: gist:fb8bbb2cbde391905d86908ca4a46c02)

Test Plan

 pytest -v -s tests/v1/e2e/test_spec_decode.py::test_eagle_correctness

CI: https://buildkite.com/vllm/amd-ci/builds/814

Signed-off-by: zhewenli <[email protected]>
Signed-off-by: zhewenli <[email protected]>
Signed-off-by: zhewenli <[email protected]>
Signed-off-by: zhewenli <[email protected]>
Signed-off-by: zhewenli <[email protected]>
Signed-off-by: zhewenli <[email protected]>
@mergify
Copy link

mergify bot commented Nov 3, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant