Skip to content

Conversation

@patil-suraj
Copy link
Contributor

Cast hidden_states returned by memory_efficient_attention_xformers to the dtype of input, as it seems some versions of xformers return output in fp32

Fixes #1195

@patil-suraj patil-suraj requested review from NouamaneTazi and patrickvonplaten and removed request for NouamaneTazi November 8, 2022 16:12
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@patil-suraj patil-suraj merged commit 5786b0e into main Nov 8, 2022
@patil-suraj patil-suraj deleted the dtype-xformers branch November 8, 2022 16:15
@HuggingFaceDocBuilder
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@camenduru
Copy link
Contributor

hidden_states = hidden_states.to(query.dtype) is this means who has old xfrormers will get slow attention 🙄 I wish somebody just tell them they have just old xformers 😥 now they will never know 😥

@patil-suraj
Copy link
Contributor Author

hidden_states = hidden_states.to(query.dtype) is this means who has old xfrormers will get slow attention 🙄 I wish somebody just tell them they have just old xformers 😥 now they will never know 😥

No, it's not slow. It just returns the output always in fp32 , so we need to cast it back to the dtype of input to make sure it works in fp16

@camenduru
Copy link
Contributor

I thought casting make it slow, thanks for the answer @patil-suraj

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory efficient attention not working with fp16 weights

6 participants