Skip to content

[Bug][CI Failure]: EAGLE Spec Decode failing with Triton Attention Backend #27619

@zhewenl

Description

@zhewenl

Name of failing test

test_eagle_correctness[TRITON_ATTN-qwen3_eagle3]

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

Summary

EAGLE speculative decoding is failing on AMD MI325X (gfx942) GPUs with a HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION error. The error occurs immediately when processing prompts starts, after successful CUDA graph capturing.
It can be reproduced with spec decode + EAGLE + Triton Attention Backend(default for AMD), example:

python3 offline_inference/spec_decode.py --test --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048

Error Details

:0:rocdevice.cpp:3675: Callback: Queue 0x7f4db0300000 aborting with error:
HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29

(using a different backend could work: eg. ROCM_AITER_FA, ROCM_AITER_UNIFIED_ATTN, but TritonAttentionBackend is the default attention backend for AMD: gist:fb8bbb2cbde391905d86908ca4a46c02)

What Works

  1. Model initialization
  2. Weight loading (target model + EAGLE draft model)
  3. CUDA graph capturing (both PIECEWISE and FULL modes)
  4. KV cache allocation

What Fails

Prompt processing - Fails immediately when execute_model is called, before processing even the first token

Error Call Stack

multiproc_executor.py:694 worker_busy_loopworker_base.py:353 execute_modelgpu_worker.py:491 execute_modelgpu_model_runner.py:2512 execute_modelgpu_model_runner.py:2404 _model_forwardself.model() [MEMORY VIOLATION]

Affecting tests: V1 Test e2e + engine, Example Test

📝 History of failing test

https://buildkite.com/vllm/ci/builds/36286#019a1ead-5029-45e3-b576-d1ac8cd5ac43
https://buildkite.com/vllm/amd-ci/builds/632#019a27b6-9f6a-4b99-842d-55eef24ea7cd

CC List.

@mxz297 @yeqcharlotte @Alexei-V-Ivanov-AMD @luccafong @njhill @LucasWilkinson

Metadata

Metadata

Labels

bugSomething isn't workingci-failureIssue about an unexpected test failure in CIspeculative-decoding

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions