Skip to content

[Bug] trtllm-gen attention kernel issues for batch size 1 #1898

@bkryu

Description

@bkryu

Description of the issue and test case added in PR1897. Reproduced here:

Trtllm-gen's attention kernels have been discovered to fail tests when batch size is 1. When batch size 1,

[STILL REQUIRES A FIX] test_trtllm_batch_decode: produces incorrect outputs with newly added parameters

## Running pytest ./tests/attention/test_trtllm_gen_attention.py::test_trtllm_batch_decode -v
>                   torch.testing.assert_close(
                        output.float(),
                        output_wrapper.float(),
                        rtol=1e-1,
                        atol=1e-1,
                    )
E                   AssertionError: Tensor-likes are not close!
E                   
E                   Mismatched elements: 1480 / 8192 (18.1%)
E                   Greatest absolute difference: 64.021484375 at index (0, 46, 106) (up to 0.1 allowed)
E                   Greatest relative difference: 1.625 at index (0, 56, 109) (up to 0.1 allowed)

[UPDATE: NOW FIXED in #1912] test_trtllm_gen_prefill_deepseek: can trigger an IMA with the newly added parameters

## Running pytest ./tests/attention/test_trtllm_gen_attention.py::test_trtllm_gen_prefill_deepseek -v
>           default_generator.manual_seed(seed)
E           torch.AcceleratorError: CUDA error: an illegal memory access was encountered
E           CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
E           For debugging consider passing CUDA_LAUNCH_BLOCKING=1
E           Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/opt/conda/envs/py312/lib/python3.12/site-packages/torch/cuda/random.py:129: AcceleratorError

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions