Memory efficient attention not working with fp16 weights

### Describe the bug

Following the example code with available in the `0.7.0` release 
```py
from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
    sample = pipe("a small cat")
```

I'm getting the following error: 
```py
RuntimeError: expected scalar type Half but found Float
```

### System Info

diffusers==0.7.2
pytorch==1.12.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory efficient attention not working with fp16 weights #1195

Describe the bug

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory efficient attention not working with fp16 weights #1195

Description

Describe the bug

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions