-
Notifications
You must be signed in to change notification settings - Fork 6.5k
integrate ort into textual-inversion and text-to-image examples #1576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate ort into textual-inversion and text-to-image examples #1576
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
@anton-l could I get a review on these scripts integrating ort with the remaining 2 stable diffusion tasks? |
patil-suraj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks a lot for adding these examples! Would be nice to add a section in the readme, showing how to use this script along with installation instructions.
| accelerator.wait_for_everyone() | ||
| if accelerator.is_main_process: | ||
| unet = accelerator.unwrap_model(unet) | ||
| if args.use_ema: | ||
| ema_unet.copy_to(unet.parameters()) | ||
|
|
||
| pipeline = StableDiffusionPipeline.from_pretrained( | ||
| args.pretrained_model_name_or_path, | ||
| text_encoder=text_encoder, | ||
| vae=vae, | ||
| unet=unet, | ||
| revision=args.revision, | ||
| ) | ||
| pipeline.save_pretrained(args.output_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to save the unet directly given it's wrapped in ORTModule?
| subfolder="unet", | ||
| revision=args.revision, | ||
| ) | ||
| unet = ORTModule(unet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a dump question: What model do we need to wrap during training, does need to be trainable or any module can be used ? Because in textual inversion the unet is not trained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patil-suraj interesting, I did not catch this. Is it the text_encoder that gets trained? Is that a BERT based model or some other architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in textual inversion only the text embedding layer is trained not even the full-text encoder. And it's a CLIPTextModel.
No description provided.