Support for Fused Attention + FP8

from @drisspg 

### Summary

Cuddn supports fused_attention with FP8 inputs. 
See: https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html#flash-fused-multi-head-att-fp8

There already exists a PR on PyTorch to add support for the Cudnn Backend to SDPA:https://github.com/pytorch/pytorch/pull/101916

This seems like a potential path attention support for FP8

copied from https://github.com/pytorch-labs/float8_experimental/issues/111

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Fused Attention + FP8 #560

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Fused Attention + FP8 #560

Description

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions