-
Notifications
You must be signed in to change notification settings - Fork 541
Open
Labels
Description
Description of the issue and test case added in PR1897. Reproduced here:
Trtllm-gen's attention kernels have been discovered to fail tests when batch size is 1. When batch size 1,
[STILL REQUIRES A FIX] test_trtllm_batch_decode: produces incorrect outputs with newly added parameters
## Running pytest ./tests/attention/test_trtllm_gen_attention.py::test_trtllm_batch_decode -v
> torch.testing.assert_close(
output.float(),
output_wrapper.float(),
rtol=1e-1,
atol=1e-1,
)
E AssertionError: Tensor-likes are not close!
E
E Mismatched elements: 1480 / 8192 (18.1%)
E Greatest absolute difference: 64.021484375 at index (0, 46, 106) (up to 0.1 allowed)
E Greatest relative difference: 1.625 at index (0, 56, 109) (up to 0.1 allowed)
[UPDATE: NOW FIXED in #1912] test_trtllm_gen_prefill_deepseek: can trigger an IMA with the newly added parameters
## Running pytest ./tests/attention/test_trtllm_gen_attention.py::test_trtllm_gen_prefill_deepseek -v
> default_generator.manual_seed(seed)
E torch.AcceleratorError: CUDA error: an illegal memory access was encountered
E CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
E For debugging consider passing CUDA_LAUNCH_BLOCKING=1
E Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
/opt/conda/envs/py312/lib/python3.12/site-packages/torch/cuda/random.py:129: AcceleratorError