-
Notifications
You must be signed in to change notification settings - Fork 6.5k
StableDiffusionUpscalePipeline #1396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
pcuenca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py
Show resolved
Hide resolved
| noise_level = torch.cat([noise_level] * 2) if do_classifier_free_guidance else noise_level | ||
|
|
||
| # 6. Prepare latent variables | ||
| height, width = image.shape[2:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly, the latents have the same size as the input image, right? In Katherine's upscaler the low-res image was upscaled using bilinear interpolation and the latents were the size of the output image. Is this not happening here?
Ok, I was wrong. In Katherine's upscaler the latents were upscaled and provided as conditioning. Now we create latents the same size as the low-res image and the vae decodes the final result to upscale it.
| unet: UNet2DConditionModel, | ||
| low_res_scheduler: DDPMScheduler, | ||
| scheduler: Union[DDIMScheduler, PNDMScheduler, LMSDiscreteScheduler], | ||
| max_noise_level: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this has to have a default with the new "optional" pipeline config arguments - otherwise this breaks
diffusers/src/diffusers/pipeline_utils.py
Line 684 in 35099b2
| optional_parameters = set({k for k, v in parameters.items() if v.default is True}) |
| max_noise_level: int, | |
| max_noise_level: int = 9 |
Not sure what a good default is here.
Overall I agree with @pcuenca feedback that the code is from_pretrained has become a bit too much of a black box / magic - but I don't really see a way around it. Overall, I'd like to strongly advertise against using optional arguments to the pipeline inits, but if it makes sense here ok for me!
We should/could maybe jump on a call in a bit to discuss this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah, thanks!
Here we could default to the value for SD2 which is 350
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't 350 be too high for the upscaling pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that high value is only used during training, here we set as indicator value which shouldn't be crossed.
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py
Outdated
Show resolved
Hide resolved
| def __call__( | ||
| self, | ||
| prompt: Union[str, List[str]], | ||
| image: Union[torch.FloatTensor, PIL.Image.Image, List[PIL.Image.Image]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the image be a latent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, the unet is conditioned on the low image not latents.
| slice_size = self.unet.config.attention_head_dim // 2 | ||
| else: | ||
| # if `attention_head_dim` is a list, take the smallest head size | ||
| slice_size = min(self.unet.config.attention_head_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we divide by two here as well?
|
|
||
| # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.decode_latents with 0.18215->0.08333 | ||
| def decode_latents(self, latents): | ||
| latents = 1 / 0.08333 * latents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool good to merge for me!
* StableDiffusionUpscalePipeline * fix a few things * make it better * fix image batching * run vae in fp32 * fix docstr * resize to mul of 64 * doc * remove safety_checker * add max_noise_level * fix Copied * begin tests * slow tests * default max_noise_level * remove kwargs * doc * fix * fix fast tests * fix fast tests * no sf * don't offload vae Co-authored-by: Patrick von Platen <[email protected]>
* StableDiffusionUpscalePipeline * fix a few things * make it better * fix image batching * run vae in fp32 * fix docstr * resize to mul of 64 * doc * remove safety_checker * add max_noise_level * fix Copied * begin tests * slow tests * default max_noise_level * remove kwargs * doc * fix * fix fast tests * fix fast tests * no sf * don't offload vae Co-authored-by: Patrick von Platen <[email protected]>
This PR adds StableDiffusionUpscalePipeline