Skip to content

Conversation

@lzyhha
Copy link
Contributor

@lzyhha lzyhha commented Apr 21, 2025

What does this PR do?

Add VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning, an in-context learning based universal image generation framework, along with corresponding tests and documentation.

Here are some test codes and their results: Model Card.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
Copy link
Member

@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread?

@sayakpaul sayakpaul requested a review from a-r-r-o-w April 21, 2025 12:17
@sayakpaul
Copy link
Member

Cc: @asomoza as well for testing if possible.

@lzyhha
Copy link
Contributor Author

lzyhha commented Apr 21, 2025

@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread?

Hello, here are some test codes and their results: Model Card.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza
Copy link
Member

asomoza commented Apr 21, 2025

Hi, really nice and thank you for your work. Currently diffusers doesn't have einops as a dependency. Is it possible that you refactor all the rearrange calls to just use a torch equivalent without the need of external libraries?

@lzyhha
Copy link
Contributor Author

lzyhha commented Apr 21, 2025

Hi, really nice and thank you for your work. Currently diffusers doesn't have einops as a dependency. Is it possible that you refactor all the rearrange calls to just use a torch equivalent without the need of external libraries?

Okay, I will make the necessary modifications. Additionally, I noticed that the call method is not functioning properly in the documentation. Could you please help check the cause?

@lzyhha
Copy link
Contributor Author

lzyhha commented Apr 22, 2025

Hello, we have removed einops from the code while ensuring the correctness of the results. @asomoza

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I just added few minor comments.

I am unsure about self.denoise(). On one hand I see its value but since it deviates from our usual pipeline implementations, I will defer the decision to the other reviewers.

@lzyhha
Copy link
Contributor Author

lzyhha commented May 1, 2025

@a-r-r-o-w Hello, I have upgraded Ruff in my environment to version 0.9.10 and resolved the errors from the previously failed workflows.

@lzyhha
Copy link
Contributor Author

lzyhha commented May 3, 2025

Hello, I have run make fix-copies to address the issue. I’m wondering if there is a way to help us quickly identify and fix remaining problems.

@sayakpaul
Copy link
Member

@lzyhha thanks for the work! From what I see, most comments are resolved. Not sure which ones you needed our help on. However, I will let @yiyixuxu or @a-r-r-o-w take care of the final merge.

@lzyhha
Copy link
Contributor Author

lzyhha commented May 5, 2025

Hello, I see that “Fast PyTorch Pipeline CPU tests (pull_request)” is failing after 359 minutes, but I can’t figure out the reason. @a-r-r-o-w

@sayakpaul
Copy link
Member

I don't that failure was caused because of this PR. It was likely infra-related.

Comment on lines +154 to +172
self.generation_pipe = VisualClozeGenerationPipeline(
vae=vae,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
transformer=transformer,
scheduler=scheduler,
resolution=resolution,
)
self.upsampling_pipe = VisualClozeUpsamplingPipeline(
vae=vae,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
transformer=transformer,
scheduler=scheduler,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lzyhha The PR looks good to me now that it's separated, but we cannot instantiate a pipeline inside another pipeline. The intention with the refactor was to make the example code look something like:

from diffusers import VisualClozeGenerationPipeline, FluxFillPipeline as VisualClozeUpsamplingPipeline

pipe1 = VisualClozeGenerationPipeline.from_pretrained(...)
pipe2 = VisualClozeUpsamplingPipeline.from_pretrained(...)

<intermediate_results> = pipe1(...)
inputs = pipe2.prepare_upsampling(<intermediate_results>)
result = pipe2(...)
# save results

Would you like to take a stab at refactoring this? If not, I'd be happy to make the changes this week so we can proceed to merge.

We can ignore any failing tests for now. I'll help fix them once the PR is ready for merge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-r-r-o-w Hello, I can make the changes, but I still have some questions about the solution and thus need to confirm that again.

I instantiate a pipeline inside another pipeline because I follow the implementation of pipeline_stable_cascade_combined.py, which instantiate two pipelines in another pipeline as follows:

self.prior_pipe = StableCascadePriorPipeline(
          prior=prior_prior,
          text_encoder=prior_text_encoder,
          tokenizer=prior_tokenizer,
          scheduler=prior_scheduler,
          image_encoder=prior_image_encoder,
          feature_extractor=prior_feature_extractor,
      )
self.decoder_pipe = StableCascadeDecoderPipeline(
          text_encoder=text_encoder,
          tokenizer=tokenizer,
          decoder=decoder,
          scheduler=scheduler,
          vqgan=vqgan,
      )

Instead, if we instantiate two pipelines via from_pretrained as follows. It seems that the same network will be instantiated twice and takes up twice the memory, since we use exactly the same model architecture and weights in both stages.

pipe1 = VisualClozeGenerationPipeline.from_pretrained(...)
pipe2 = VisualClozeUpsamplingPipeline.from_pretrained(...)

Copy link
Contributor Author

@lzyhha lzyhha May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipeline_kandinsky2_2_combined also instantiates a pipeline inside another pipeline.

I would like to confirm a way that makes inference more convenient while avoiding unnecessary memory usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that's a good point. I was not aware of whether we want to continue maintaining the pipelines that way. Since I'm not sure on the exact approach @yiyixuxu wanted to follow, I now think that what we have currently in the PR is okay (I had something else in mind initially when she asked for two separate pipielines). Let's wait for YiYi to give the PR another look, and I'll then run the example snippets to confirm & merge. Sorry for the confusion 😅

Instead, if we instantiate two pipelines via from_pretrained as follows. It seems that the same network will be instantiated twice and takes up twice the memory, since we use exactly the same model architecture and weights in both stages.

That shouldn't be an issue if/when we modify the implementation to how I described, because we can share the underlying model components during pipeline initialization. Thanks to your comment, I just realized that we probably want to maintain consistency with Stable Cascade and Kandinsky, so will let YiYi review further

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I now understand that the method you just mentioned doesn’t require additional memory. I’ll make the changes once a final approach is confirmed.

Copy link
Collaborator

@yiyixuxu yiyixuxu May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @lzyhha @a-r-r-o-w

I think it makes sense to make 3 pipelines no? generation, upsample, and a combined this way user can run the pipelines using the API @a-r-r-o-w suggsted

and yes, yo do not need to take additional memory, either initialize separately or within the combined. with this API, you can use from_pipe to reuse the components

from diffusers import VisualClozeGenerationPipeline, FluxFillPipeline as VisualClozeUpsamplingPipeline

pipe = VisualClozeGenerationPipeline.from_pretrained(...)
pipe_upsample = VisualClozeUpsamplingPipeline.from_pipe(pipe1)

<intermediate_results> = pipe(...)
inputs = pipe_upsample.prepare_upsampling(<intermediate_results>)
result = pipe_upsample(...)
``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu As one of the pipelines is FluxFillPipeline, we only need the two here I think. PR should be good to go now, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh sounds good

@a-r-r-o-w
Copy link
Contributor

The following tests seem to be failing:

FAILED tests/pipelines/visualcloze/test_pipeline_visualcloze_combined.py::VisualClozePipelineFastTests::test_callback_cfg - AttributeError: 'VisualClozePipeline' object has no attribute 'guidance_scale'
FAILED tests/pipelines/visualcloze/test_pipeline_visualcloze_combined.py::VisualClozePipelineFastTests::test_save_load_dduf - Failed: Timeout >60.0s
FAILED tests/pipelines/visualcloze/test_pipeline_visualcloze_combined.py::VisualClozePipelineFastTests::test_save_load_local

For (1), it should be adding the @property decorator for guidance scale

For (2), I think you can just disable the dduf test by setting supports_dduf = False

For (3), not really sure what's the problem as there seems to be no error message. I can take a look soon

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome job! thank you both @lzyhha @a-r-r-o-w

Comment on lines +154 to +172
self.generation_pipe = VisualClozeGenerationPipeline(
vae=vae,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
transformer=transformer,
scheduler=scheduler,
resolution=resolution,
)
self.upsampling_pipe = VisualClozeUpsamplingPipeline(
vae=vae,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
transformer=transformer,
scheduler=scheduler,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh sounds good

)
if upsampling_strength == 0:
# Offload all models
self.maybe_free_model_hooks()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? we have this step inside the pipeline already and they contain all the components, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I have deleted maybe_free_model_hooks from pipeline_visualcloze_combined.

@lzyhha
Copy link
Contributor Author

lzyhha commented May 9, 2025

Hello, may I kindly ask if there are any new developments. @a-r-r-o-w

@a-r-r-o-w
Copy link
Contributor

@lzyhha Sorry for the delay! Taking a final look at the example snippets and will merge after that

Copy link
Contributor

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some last updates to examples so that they are runnable with copy-paste

@a-r-r-o-w a-r-r-o-w merged commit 4f438de into huggingface:main May 12, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Diffusers Roadmap 0.36 May 12, 2025
@Abhinay1997
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants