Skip to content

Conversation

@blefaudeux
Copy link
Contributor

20Go -> 16Go Ram use for some workloads, same speed (you don´t have to materialize intermediates with torch.cdist)

@blefaudeux
Copy link
Contributor Author

cc @patrickvonplaten, not what we discussed but this is an effective three liner

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 28, 2022

The documentation is not available anymore as the PR was closed or merged.

@kashif
Copy link
Contributor

kashif commented Nov 28, 2022

LGTM! thanks!

@patrickvonplaten
Copy link
Contributor

Hey @blefaudeux,

How to you use this feature I think it's only used in decoding if "force_not_quantize" is set to True no?

@blefaudeux
Copy link
Contributor Author

blefaudeux commented Dec 1, 2022

Hey @blefaudeux,

How to you use this feature I think it's only used in decoding if "force_not_quantize" is set to True no?

It's in the superres path, not doing that just eats 4GB ram when decoding for nothing. It's very much not perfect though, I'm looking at better options, but better than not doing that :)

@blefaudeux
Copy link
Contributor Author

improves on #1434

@blefaudeux
Copy link
Contributor Author

cc @patil-suraj, if you're interested in high res superres

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR, this looks good to me! Will run the slow tests and then merge.

Also for high resolution upscaling, I'm exploring another option in #1521, and it seems to work well.

@patil-suraj patil-suraj merged commit 720dbfc into huggingface:main Dec 5, 2022
@blefaudeux
Copy link
Contributor Author

Thanks a lot for the PR, this looks good to me! Will run the slow tests and then merge.

Also for high resolution upscaling, I'm exploring another option in #1521, and it seems to work well.

thanks for the link ! for this PR I think it's always worth it because no tradeoff, it's just better than the previous three lines, but it's not enough to enable high res that's for sure ! No issues with borders when splitting the decode ?

Another option if the convs were depth wise would have been to compute them depth-first (à la "Rerformer" years ago), but that's probably not a reasonable option so I guess that splitting is as good as it gets ?

@blefaudeux blefaudeux deleted the vae_cdist branch December 7, 2022 07:26
tcapelle pushed a commit to tcapelle/diffusers that referenced this pull request Dec 12, 2022
sliard pushed a commit to sliard/diffusers that referenced this pull request Dec 21, 2022
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants