Skip to content

Memory efficient attention not working with fp16 weights #1195

@apolinario

Description

@apolinario

Describe the bug

Following the example code with available in the 0.7.0 release

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

with torch.inference_mode():
    sample = pipe("a small cat")

I'm getting the following error:

RuntimeError: expected scalar type Half but found Float

System Info

diffusers==0.7.2
pytorch==1.12.1

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions