- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.4k
Add VisualCloze #11377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VisualCloze #11377
Conversation
| @lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread? | 
| Cc: @asomoza as well for testing if possible. | 
| 
 Hello, here are some test codes and their results: Model Card. | 
| The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. | 
| Hi, really nice and thank you for your work. Currently diffusers doesn't have  | 
| 
 Okay, I will make the necessary modifications. Additionally, I noticed that the call method is not functioning properly in the documentation. Could you please help check the cause? | 
Co-authored-by: Álvaro Somoza <[email protected]>
| Hello, we have removed einops from the code while ensuring the correctness of the results. @asomoza | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I just added few minor comments.
I am unsure about self.denoise(). On one hand I see its value but since it deviates from our usual pipeline implementations, I will defer the decision to the other reviewers.
| @a-r-r-o-w Hello, I have upgraded Ruff in my environment to version 0.9.10 and resolved the errors from the previously failed workflows. | 
| Hello, I have run make fix-copies to address the issue. I’m wondering if there is a way to help us quickly identify and fix remaining problems. | 
| @lzyhha thanks for the work! From what I see, most comments are resolved. Not sure which ones you needed our help on. However, I will let @yiyixuxu or @a-r-r-o-w take care of the final merge. | 
| Hello, I see that “Fast PyTorch Pipeline CPU tests (pull_request)” is failing after 359 minutes, but I can’t figure out the reason. @a-r-r-o-w | 
| I don't that failure was caused because of this PR. It was likely infra-related. | 
| self.generation_pipe = VisualClozeGenerationPipeline( | ||
| vae=vae, | ||
| text_encoder=text_encoder, | ||
| text_encoder_2=text_encoder_2, | ||
| tokenizer=tokenizer, | ||
| tokenizer_2=tokenizer_2, | ||
| transformer=transformer, | ||
| scheduler=scheduler, | ||
| resolution=resolution, | ||
| ) | ||
| self.upsampling_pipe = VisualClozeUpsamplingPipeline( | ||
| vae=vae, | ||
| text_encoder=text_encoder, | ||
| text_encoder_2=text_encoder_2, | ||
| tokenizer=tokenizer, | ||
| tokenizer_2=tokenizer_2, | ||
| transformer=transformer, | ||
| scheduler=scheduler, | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lzyhha The PR looks good to me now that it's separated, but we cannot instantiate a pipeline inside another pipeline. The intention with the refactor was to make the example code look something like:
from diffusers import VisualClozeGenerationPipeline, FluxFillPipeline as VisualClozeUpsamplingPipeline
pipe1 = VisualClozeGenerationPipeline.from_pretrained(...)
pipe2 = VisualClozeUpsamplingPipeline.from_pretrained(...)
<intermediate_results> = pipe1(...)
inputs = pipe2.prepare_upsampling(<intermediate_results>)
result = pipe2(...)
# save resultsWould you like to take a stab at refactoring this? If not, I'd be happy to make the changes this week so we can proceed to merge.
We can ignore any failing tests for now. I'll help fix them once the PR is ready for merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-r-r-o-w Hello, I can make the changes, but I still have some questions about the solution and thus need to confirm that again.
I instantiate a pipeline inside another pipeline because I follow the implementation of pipeline_stable_cascade_combined.py, which instantiate two pipelines in another pipeline as follows:
self.prior_pipe = StableCascadePriorPipeline(
          prior=prior_prior,
          text_encoder=prior_text_encoder,
          tokenizer=prior_tokenizer,
          scheduler=prior_scheduler,
          image_encoder=prior_image_encoder,
          feature_extractor=prior_feature_extractor,
      )
self.decoder_pipe = StableCascadeDecoderPipeline(
          text_encoder=text_encoder,
          tokenizer=tokenizer,
          decoder=decoder,
          scheduler=scheduler,
          vqgan=vqgan,
      )Instead, if we instantiate two pipelines via from_pretrained as follows. It seems that the same network will be instantiated twice and takes up twice the memory, since we use exactly the same model architecture and weights in both stages.
pipe1 = VisualClozeGenerationPipeline.from_pretrained(...)
pipe2 = VisualClozeUpsamplingPipeline.from_pretrained(...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline_kandinsky2_2_combined also instantiates a pipeline inside another pipeline.
I would like to confirm a way that makes inference more convenient while avoiding unnecessary memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's a good point. I was not aware of whether we want to continue maintaining the pipelines that way. Since I'm not sure on the exact approach @yiyixuxu wanted to follow, I now think that what we have currently in the PR is okay (I had something else in mind initially when she asked for two separate pipielines). Let's wait for YiYi to give the PR another look, and I'll then run the example snippets to confirm & merge. Sorry for the confusion 😅
Instead, if we instantiate two pipelines via from_pretrained as follows. It seems that the same network will be instantiated twice and takes up twice the memory, since we use exactly the same model architecture and weights in both stages.
That shouldn't be an issue if/when we modify the implementation to how I described, because we can share the underlying model components during pipeline initialization. Thanks to your comment, I just realized that we probably want to maintain consistency with Stable Cascade and Kandinsky, so will let YiYi review further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I now understand that the method you just mentioned doesn’t require additional memory. I’ll make the changes once a final approach is confirmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to make 3 pipelines no? generation, upsample, and a combined this way user can run the pipelines using the API @a-r-r-o-w suggsted
and yes, yo do not need to take additional memory, either initialize separately or within the combined. with this API, you can use from_pipe to reuse the components
from diffusers import VisualClozeGenerationPipeline, FluxFillPipeline as VisualClozeUpsamplingPipeline
pipe = VisualClozeGenerationPipeline.from_pretrained(...)
pipe_upsample = VisualClozeUpsamplingPipeline.from_pipe(pipe1)
<intermediate_results> = pipe(...)
inputs = pipe_upsample.prepare_upsampling(<intermediate_results>)
result = pipe_upsample(...)
``There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu As one of the pipelines is FluxFillPipeline, we only need the two here I think. PR should be good to go now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh sounds good
| The following tests seem to be failing: For (1), it should be adding the  For (2), I think you can just disable the dduf test by setting  For (3), not really sure what's the problem as there seems to be no error message. I can take a look soon | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome job! thank you both @lzyhha @a-r-r-o-w
| self.generation_pipe = VisualClozeGenerationPipeline( | ||
| vae=vae, | ||
| text_encoder=text_encoder, | ||
| text_encoder_2=text_encoder_2, | ||
| tokenizer=tokenizer, | ||
| tokenizer_2=tokenizer_2, | ||
| transformer=transformer, | ||
| scheduler=scheduler, | ||
| resolution=resolution, | ||
| ) | ||
| self.upsampling_pipe = VisualClozeUpsamplingPipeline( | ||
| vae=vae, | ||
| text_encoder=text_encoder, | ||
| text_encoder_2=text_encoder_2, | ||
| tokenizer=tokenizer, | ||
| tokenizer_2=tokenizer_2, | ||
| transformer=transformer, | ||
| scheduler=scheduler, | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh sounds good
| ) | ||
| if upsampling_strength == 0: | ||
| # Offload all models | ||
| self.maybe_free_model_hooks() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? we have this step inside the pipeline already and they contain all the components, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I have deleted maybe_free_model_hooks from pipeline_visualcloze_combined.
| Hello, may I kindly ask if there are any new developments. @a-r-r-o-w | 
| @lzyhha Sorry for the delay! Taking a final look at the example snippets and will merge after that | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some last updates to examples so that they are runnable with copy-paste
| Hey @lzyhha theres a bug here: https://github.com/lzyhha/diffusers/blob/main/src/diffusers/pipelines/visualcloze/visualcloze_utils.py#L113 please replace self.height with self.resize | 
What does this PR do?
Add VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning, an in-context learning based universal image generation framework, along with corresponding tests and documentation.
Here are some test codes and their results: Model Card.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.