from @drisspg ### Summary Cuddn supports fused_attention with FP8 inputs. See: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#flash-fused-multi-head-att-fp8 There already exists a PR on PyTorch to add support for the Cudnn Backend to SDPA:https://github.com/pytorch/pytorch/pull/101916 This seems like a potential path attention support for FP8 copied from https://github.com/pytorch-labs/float8_experimental/issues/111