feat: Qwen-Image-Edit inference and training #473

kohya-ss · 2025-08-19T12:21:02Z

Use Image Dataset with Control Images: https://github.com/kohya-ss/musubi-tuner/blob/main/src/musubi_tuner/dataset/dataset_config.md#sample-for-image-dataset-with-control-images
- flux_kontext_no_resize_control is not supported. Control images are resized/cropped to the target image size.
Use Qwen-Image-Edit weights from official HF page (specify 00001 to --dit) or ComyUI repackaged (use bf16, not tested.)
Latent and Text Encoder output caching commands are same as Qwen-Image. However re-run both to include control images.
Add --edit option to Text Encoder output caching, inference and training script.
Specify --control_image_path to specify control image for inference script like FLUX.1 Kontext.
The sample prompt (during training) must include --ci path/to/control.png (or jpg etc.) option to specify the control image.

…th calculations

kohya-ss · 2025-08-20T03:32:28Z

Token max length calculation is fixed. Please re-run Text Encoder output caching.

sdbds · 2025-08-20T12:24:28Z

After testing, normal training works, but it seems that under the bucket [1328,1328], 50GB VRAM is required

kohya-ss · 2025-08-20T12:50:47Z

After testing, normal training works, but it seems that under the bucket [1328,1328], 50GB VRAM is required

Thank you for testing!

The model weights consume just under 40GB, and the tensor sequence length is twice that of Qwen-Image training, so memory consumption is quite high. Although there are quality issues, specifying --fp8_base and --fp8_scaled may be more practical.

…tion processing

kohya-ss · 2025-08-20T13:37:03Z

By removing unnecessary variables, the peak memory was reduced by about 1GB when training 1328x1328 Qwen-Image-Edit.

kohya-ss · 2025-08-20T14:30:43Z

The original Diffusers implementation required the generated image and the control image to have the same resolution, but this appears to have been a bug in Diffusers.

huggingface/diffusers#12188
huggingface/diffusers#12190

Allowing control images of any size, similar to training FLUX.1 Kontext, may reduce memory consumption.

sdbds · 2025-08-20T15:00:03Z

I feel that qwen edit learns very slowly, and I'm not sure if it's a problem with the model itself...

sdbds · 2025-08-20T16:14:29Z

modelscope/DiffSynth-Studio#814

I checked the diffsyth code, and it seems they modified the TE template and drop idx when use qwen-image edit

kohya-ss · 2025-08-20T22:13:05Z

modelscope/DiffSynth-Studio#814

I checked the diffsyth code, and it seems they modified the TE template and drop idx when use qwen-image edit

We are already doing with that 😄 :

musubi-tuner/src/musubi_tuner/qwen_image/qwen_image_utils.py

Line 375 in d33969c

def get_qwen_prompt_embeds_with_image(

mliand · 2025-08-21T06:20:30Z

some difficulties in the same training.

…g and resizing options

… commands

kohya-ss · 2025-08-21T11:01:31Z

dataset_config.md now has following two new options for Qwen-Image-Edit.

qwen_image_edit_no_resize_control : This is the same as FLUX.1 Kontext, does not resize the control image.
qwen_image_edit_control_resolution: Resizes to a specific bucket size. To resize the control images the same as the official code, specify [1024,1024].

If neither option is present, the control image will be resized to the same size as the target image. Please re-run latent and Text Encoder caching if these options are changed.

qwen_image_generate_image.py has following two new options.

--resize_control_to_image_size: Resize the control image for generation size.
--resize_control_to_official_size: Resize the control image to match official size (1M pixels keeping aspect ratio).

FLUX.1 Kontext caching/training are also updated, so the cache must be re-cached.

Copilot

Pull Request Overview

This PR adds support for Qwen-Image-Edit inference and training, extending the existing Qwen-Image support to include image editing capabilities with control images. The implementation mirrors the structure of existing architectures but adds support for conditioning on input images for editing tasks.

Adds --edit flag to enable Qwen-Image-Edit mode with control image support
Extends existing Qwen-Image utilities to handle vision-language processing with images
Updates dataset handling to support control images for both training and inference

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/musubi_tuner/utils/sai_model_spec.py	Fixed architecture constant references for metadata generation
src/musubi_tuner/utils/safetensors_utils.py	Added utility function to find keys in safetensors files
src/musubi_tuner/utils/image_utils.py	Corrected documentation for image preprocessing return format
src/musubi_tuner/qwen_image_train_network.py	Added edit mode support with control image processing and VL encoding
src/musubi_tuner/qwen_image_generate_image.py	Added edit mode inference with control image handling and CLI options
src/musubi_tuner/qwen_image_cache_text_encoder_outputs.py	Extended to support image-conditioned text encoding for edit mode
src/musubi_tuner/qwen_image_cache_latents.py	Added control latent caching support for edit mode
src/musubi_tuner/qwen_image/qwen_image_utils.py	Added VL processor loading and image-conditioned prompt encoding functions
src/musubi_tuner/qwen_image/qwen_image_model.py	Optimized attention implementation and fixed RoPE computation for better memory usage
src/musubi_tuner/flux_kontext_train_network.py	Standardized control latent handling to match other architectures
src/musubi_tuner/flux_kontext_cache_latents.py	Unified control image processing and latent batching
src/musubi_tuner/dataset/image_video_dataset.py	Added edit mode configuration options and control image resizing logic
src/musubi_tuner/dataset/config_utils.py	Added configuration schema for edit mode parameters
src/musubi_tuner/cache_text_encoder_outputs.py	Extended to support content-requiring encoders for VL models
.ai/context/overview.md	Updated documentation to include Qwen-Image support

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/musubi_tuner/qwen_image/qwen_image_utils.py

src/musubi_tuner/qwen_image_generate_image.py

src/musubi_tuner/qwen_image/qwen_image_model.py

src/musubi_tuner/qwen_image_generate_image.py

Co-authored-by: Copilot <[email protected]>

… configurations

kohya-ss · 2025-08-21T13:19:40Z

Documents are added.

sdbds · 2025-08-21T14:08:22Z

After testing, inputting [1024,1024] seems to result in significantly better training performance. Perhaps it needs to match the official input of 1M pixels...
Also, the default learning rate of 1E-4 seems a bit high for this model? It feels like it's easily overfitting.

kohya-ss · 2025-08-21T14:21:30Z

After testing, inputting [1024,1024] seems to result in significantly better training performance. Perhaps it needs to match the official input of 1M pixels...

By specifying qwen_image_edit_control_resolution = [1024, 1024] in the dataset config, The control images are fixed to 1M pixels regardless of the resolution of the training images.

However, it seems like it would also be a good idea to set the training images to [1024,1024].

Also, the default learning rate of 1E-4 seems a bit high for this model? It feels like it's easily overfitting.

Thank you! I think the default should be a conservative value, and I also use a lower learning rate than other models, so I changed it to 5e-5. I also changed rank(dim) to 16 because LoRA becomes huge with 32.

…ontext for README

sdbds · 2025-08-25T08:27:54Z

@kohya-ss
Because the effect is very strange, I specifically asked the official technical staff, and they told me that they use specialized bucket training for the edit model.
I think we can consider setting up these specialized buckets, like wan 2.1.

kohya-ss · 2025-08-25T12:32:09Z

Because the effect is very strange, I specifically asked the official technical staff, and they told me that they use specialized bucket training for the edit model.
I think we can consider setting up these specialized buckets, like wan 2.1.

Thank you! Hmm... It would probably be better to make this available as an option.

It might be a good idea to add the use_qwen_image_default_buckets option in the dataset config, or to allow users to specify any resolution using the custom_bucket_resolutions option, etc.

feat: Qwen-Image-Edit inference and training

1ec18aa

kohya-ss mentioned this pull request Aug 19, 2025

Looking forward to your Qwen-image-edit training code implementation. #470

Closed

fix: update mask handling to use boolean dtype for accurate text leng…

3ebd1df

…th calculations

kohya-ss added 2 commits August 20, 2025 22:05

Merge branch 'main' into feat-qwen-image-edit-support

ce39abe

fix: optimize memory management by deleting unused variables in atten…

d33969c

…tion processing

kohya-ss added 2 commits August 21, 2025 19:54

feat: add support for Qwen image editing with control image processin…

16b1a90

…g and resizing options

feat: update overview.md to include Qwen-Image training and inference…

79a7a7e

… commands

kohya-ss mentioned this pull request Aug 21, 2025

Support Qwen Image Edit Model? kohya-ss/sd-scripts#2182

Closed

kohya-ss requested a review from Copilot August 21, 2025 12:09

Copilot AI reviewed Aug 21, 2025

View reviewed changes

kohya-ss and others added 4 commits August 21, 2025 21:19

Update src/musubi_tuner/qwen_image_generate_image.py

0fb0a7a

Co-authored-by: Copilot <[email protected]>

Update src/musubi_tuner/qwen_image/qwen_image_model.py

6a833db

Co-authored-by: Copilot <[email protected]>

fix: add parameter for prepare_image_inputs and update type hint

7999034

feat: update documentation for Qwen-Image and Qwen-Image-Edit dataset…

5c1d90a

… configurations

fix: adjust learning rate and network dimension in training parameters

82c5ce1

kohya-ss marked this pull request as ready for review August 21, 2025 14:21

doc: add Qwen-Image-Edit support and update cache format for FLUX.1 K…

7b7d5c8

…ontext for README

kohya-ss merged commit fe99c71 into main Aug 21, 2025

kohya-ss deleted the feat-qwen-image-edit-support branch August 21, 2025 22:29

Uh oh!

feat: Qwen-Image-Edit inference and training #473

feat: Qwen-Image-Edit inference and training #473

Uh oh!

Conversation

kohya-ss commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented Aug 20, 2025

Uh oh!

sdbds commented Aug 20, 2025

Uh oh!

kohya-ss commented Aug 20, 2025

Uh oh!

kohya-ss commented Aug 20, 2025

Uh oh!

kohya-ss commented Aug 20, 2025

Uh oh!

sdbds commented Aug 20, 2025

Uh oh!

sdbds commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented Aug 20, 2025

Uh oh!

mliand commented Aug 21, 2025

Uh oh!

kohya-ss commented Aug 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kohya-ss commented Aug 21, 2025

Uh oh!

sdbds commented Aug 21, 2025

Uh oh!

kohya-ss commented Aug 21, 2025

Uh oh!

sdbds commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented Aug 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kohya-ss commented Aug 19, 2025 •

edited

Loading

sdbds commented Aug 20, 2025 •

edited

Loading

sdbds commented Aug 25, 2025 •

edited

Loading