kohya-ss · kohya-ss · Aug 21, 2025 · Aug 19, 2025 · Aug 20, 2025 · Aug 20, 2025
diff --git a/.ai/context/overview.md b/.ai/context/overview.md
@@ -4,7 +4,7 @@ This file provides guidance to developers when working with code in this reposit
 
 ## Project Overview
 
-Musubi Tuner is a Python-based training framework for LoRA (Low-Rank Adaptation) models with multiple video generation architectures including HunyuanVideo, Wan2.1, FramePack, and FLUX.1 Kontext. The project focuses on memory-efficient training and inference for video generation models.
+Musubi Tuner is a Python-based training framework for LoRA (Low-Rank Adaptation) models with multiple video generation architectures including HunyuanVideo, Wan2.1/2.2, FramePack, FLUX.1 Kontext and Qwen-Image. The project focuses on memory-efficient training and inference for video generation models.
 
 ## Installation and Environment
 
@@ -41,6 +41,9 @@ python src/musubi_tuner/fpack_train_network.py [similar args]
 
 # FLUX.1 Kontext training
 python src/musubi_tuner/flux_kontext_train_network.py [similar args]
+
+# Qwen-Image training
+python src/musubi_tuner/qwen_image_train_network.py [similar args]
 ```
 
 ### Inference Commands
@@ -53,6 +56,12 @@ python src/musubi_tuner/wan_generate_video.py [similar args]
 
 # FramePack inference
 python src/musubi_tuner/fpack_generate_video.py [similar args]
+
+# FLUX.1 Kontext inference
+python src/musubi_tuner/flux_kontext_generate_image.py --control_image_path path/to/control_image.png [similar args]
+
+# Qwen-Image inference
+python src/musubi_tuner/qwen_image_generate_image.py [similar args to FLUX.1 Kontext]
 ```
 
 ### Utility Commands
@@ -80,9 +89,10 @@ No formal test suite is present in this repository. The project relies on manual
 
 ### Architecture-Specific Modules
 - `hunyuan_model/`: HunyuanVideo model implementation and utilities
-- `wan/`: Wan2.1 model configurations and modules
+- `wan/`: Wan2.1/2.2 model configurations and modules
 - `frame_pack/`: FramePack model implementation and utilities
 - `flux/`: FLUX model utilities
+- `qwen_image/`: Qwen-Image model utilities
 
 ### Key Components
 - **Dataset Configuration**: Uses TOML files for complex dataset setups supporting images, videos, control images, and metadata JSONL files

diff --git a/README.ja.md b/README.ja.md
@@ -52,6 +52,10 @@ Wan2.1/2.2については、[Wan2.1/2.2のドキュメント](./docs/wan.md)も
 
 - GitHub Discussionsを有効にしました。コミュニティのQ&A、知識共有、技術情報の交換などにご利用ください。バグ報告や機能リクエストにはIssuesを、質問や経験の共有にはDiscussionsをご利用ください。[Discussionはこちら](https://github.com/kohya-ss/musubi-tuner/discussions)
 
+- 2025/08/22
+    - Qwen-Image-Editに対応しました。PR [#473](https://github.com/kohya-ss/musubi-tuner/pull/473) 詳細は[Qwen-Imageのドキュメント](./docs/qwen_image.md)を参照してください。変更が多岐に渡るため既存機能へ影響がある可能性があります。不具合が発生した場合は、[Issues](https://github.com/kohya-ss/musubi-tuner/issues)でご報告ください。
+    - **破壊的変更**: この変更に伴いFLUX.1 Kontextのキャッシュフォーマットが変更されました。Latentキャッシュを再作成してください。
+
 - 2025/08/18
     - `qwen_image_train_network.py`の訓練時の`--network_module networks.lora_qwen_image`の指定について、ドキュメントへの記載が漏れていました。[ドキュメント](./docs/qwen_image.md#training--学習)を修正しました。
 
@@ -61,33 +65,6 @@ Wan2.1/2.2については、[Wan2.1/2.2のドキュメント](./docs/wan.md)も
 - 2025/08/15
     - Timestep Bucketing機能が追加されました。これにより、タイムステップの分布がより均一になり、学習が安定します。PR [#418](https://github.com/kohya-ss/musubi-tuner/pull/418) 詳細は[Timestep Bucketingのドキュメント](./docs/advanced_config.md#timestep-bucketing-for-uniform-sampling--均一なサンプリングのためのtimestep-bucketing)を参照してください。
 
-- 2025/08/14
-    - `convert_lora.py`がQwen-ImageのLoRAをサポートしました。PR [#444](https://github.com/kohya-ss/musubi-tuner/pull/444) Diffusers形式との相互変換が可能です。詳細は[LoRAの形式の変換](#loraの形式の変換)を参照してください。
-
-- 2025/08/11
-    - `--timestep_sampling`に`qwen_shift`が追加されました。これはQwen-Imageの推論時と同じ方法で、各画像の解像度に基づいた動的シフト値を使用します。またこれに伴い`qinglong`は`qinglong_flux`と`qinglong_qwen`に分割されました。PR [#428](https://github.com/kohya-ss/musubi-tuner/pull/428) sdbds氏に感謝します。詳細は[Qwen-Imageのドキュメント](./docs/qwen_image.md#timestep_sampling--タイムステップのサンプリング)および[高度な設定](./docs/advanced_config.md#style-friendly-snr-sampler)を参照してください。
-    - `wan_generate_video.py` でWan2.2のhigh/lowモデルを使用するときに、遅延読み込みを行う`--lazy_loading`オプションを追加しました。PR [#427](https://github.com/kohya-ss/musubi-tuner/pull/427) 詳細は[こちら](./docs/wan.md#inference--推論)を参照してください。
-
-- 2025/08/10
-    - Qwen-Imageに対応しました。PR [#408](https://github.com/kohya-ss/musubi-tuner/pull/408) 詳細は[Qwen-Imageのドキュメント](./docs/qwen_image.md)を参照してください。
-
-- 2025/08/09
-    - wandbにログ出力用設定しているとき、サンプル生成画像もwandbに出力されるようになりました。PR [#420](https://github.com/kohya-ss/musubi-tuner/pull/420) xhiroga 氏に感謝します。
-
-- 2025/08/08
-    - Wan2.2に対応しました。PR [#399](https://github.com/kohya-ss/musubi-tuner/pull/399) 詳細は[Wan2.1/2.2のドキュメント](./docs/wan.md)を参照してください。
-
-        Wan2.2はhigh noiseとlow noiseの二つのモデルから構成され、LoRAの学習時にどちらか一方、または両方を選択することができます。それに伴いtimestepの指定が必要になりますので、ドキュメントをご確認ください。
-
-- 2025/08/07
-    - タイムステップのサンプリングに新しく `logsnr` と `qinglong` のサンプリング手法を追加しました。PR [#407](https://github.com/kohya-ss/musubi-tuner/pull/407) でsdbds氏により提案されました。sdbds氏に感謝します。logsnrはスタイルの学習に特化し、qinglongはスタイル学習、モデルの安定性、ディテールの再現性を考慮したハイブリッドサンプリング手法です。詳細は[こちらのドキュメント](./docs/advanced_config.md#style-friendly-snr-sampler)を参照してください。
-
-- 2025/08/02
-    - `--fp8_scaled`を指定したときのFramePack、Wan2.1のモデル読み込みのピークメモリ使用量を削減しました。これにより、学習、推論前のモデル読み込み時のVRAM使用量が削減されます。
-
-- 2025/08/01
-    - FLUX. KontextのLoRA学習でblock swapが動作しない不具合を修正しました。[PR #402](https://github.com/kohya-ss/musubi-tuner/pull/402) および [PR #403](https://github.com/kohya-ss/musubi-tuner/pull/403) sdbds氏に感謝します。
-
 ### リリースについて
 
 Musubi Tunerの解説記事執筆や、関連ツールの開発に取り組んでくださる方々に感謝いたします。このプロジェクトは開発中のため、互換性のない変更や機能追加が起きる可能性があります。想定外の互換性問題を避けるため、参照用として[リリース](https://github.com/kohya-ss/musubi-tuner/releases)をお使いください。

diff --git a/README.md b/README.md
@@ -61,6 +61,10 @@ If you find this project helpful, please consider supporting its development via
 
 - GitHub Discussions Enabled: We've enabled GitHub Discussions for community Q&A, knowledge sharing, and technical information exchange. Please use Issues for bug reports and feature requests, and Discussions for questions and sharing experiences. [Join the conversation →](https://github.com/kohya-ss/musubi-tuner/discussions)
 
+- August 22, 2025:
+    - Qwen-Image-Edit support has been added. See PR [#473](https://github.com/kohya-ss/musubi-tuner/pull/473) and the [Qwen-Image documentation](./docs/qwen_image.md) for details. This change may affect existing features due to its extensive nature. If you encounter any issues, please report them in the [Issues](https://github.com/kohya-ss/musubi-tuner/issues).
+    - **Breaking Change**: The cache format for FLUX.1 Kontext has been changed with this update. Please recreate the latent cache.
+
 - August 18, 2025:
     - The option `--network_module networks.lora_qwen_image` was missing from the documentation for training with `qwen_image_train_network.py`. The [documentation](./docs/qwen_image.md#training--学習) has been fixed to include this information.
 
@@ -70,36 +74,6 @@ If you find this project helpful, please consider supporting its development via
 - August 15, 2025:
     - The Timestep Bucketing feature has been added, which allows for a more uniform distribution of timesteps and stabilizes training. See PR [#418](https://github.com/kohya-ss/musubi-tuner/pull/418) and the [Timestep Bucketing documentation](./docs/advanced_config.md#timestep-bucketing-for-uniform-sampling--均一なサンプリングのためのtimestep-bucketing) for details.
 
-- August 14, 2025:
-    - `convert_lora.py` now supports conversion for Qwen-Image LoRA models with Diffusers format. PR [#444](https://github.com/kohya-ss/musubi-tuner/pull/444) See [here](#convert-lora-to-another-format) for more details.
-
-- August 11, 2025:
-    - Added `--timestep_sampling` option with `qwen_shift`. This uses the same method as during inference for Qwen-Image, employing dynamic shift values based on the resolution of each image (typically around 2.2 for 1328x1328 images). Additionally, `qinglong` has been split into `qinglong_flux` and `qinglong_qwen`. Thanks to sdbds for [PR #428](https://github.com/kohya-ss/musubi-tuner/pull/428). 
-
-        For details, see the [Qwen-Image documentation](./docs/qwen_image.md#timestep_sampling--タイムステップのサンプリング) and [Advanced Configuration](./docs/advanced_config.md#style-friendly-snr-sampler).
-
-    - Added `--lazy_loading` option for delayed loading of DiT models when using Wan2.2 high/low models in `wan_generate_video.py`. [PR #427](https://github.com/kohya-ss/musubi-tuner/pull/427) See [Wan2.2 documentation](./docs/wan.md#inference--推論) for details.
-
-- August 10, 2025:
-    - Added support for Qwen-Image. See [Qwen-Image documentation](./docs/qwen_image.md) for details.
-
-- August 9, 2025:
-    - When logging to wandb, sample generation images are now also logged to wandb. Thanks to xhiroga for [PR #420](https://github.com/kohya-ss/musubi-tuner/pull/420).
-
-- August 8, 2025:
-    - Added support for Wan2.2.  [PR #399](https://github.com/kohya-ss/musubi-tuner/pull/399). See [Wan2.1/2.2 documentation](./docs/wan.md). 
-
-        Wan2.2 consists of two models: high noise and low noise. During LoRA training, you can choose either one or both. Please refer to the documentation for details on specifying timesteps.
-
-- August 7, 2025:
-    - Added new sampling methods for timesteps: `logsnr` and `qinglong`. Thank you to sdbds for proposing this in [PR #407](https://github.com/kohya-ss/musubi-tuner/pull/407). `logsnr` is designed for style learning, while `qinglong` is a hybrid sampling method that considers style learning, model stability, and detail reproduction. For details, see the [Style-friendly SNR Sampler documentation](./docs/advanced_config.md#style-friendly-snr-sampler).
-
-- August 2, 2025:
-    - Reduced peak memory usage during model loading for FramePack and Wan2.1 when using `--fp8_scaled`. This reduces VRAM usage during model loading before training and inference.
-
-- August 1, 2025:
-    - Fixed the issue where block swapping did not work in FLUX. Kontext LoRA training. Thanks to sdbds for [PR #402](https://github.com/kohya-ss/musubi-tuner/pull/402). [PR #403](https://github.com/kohya-ss/musubi-tuner/pull/403).
-
 ### Releases
 
 We are grateful to everyone who has been contributing to the Musubi Tuner ecosystem through documentation and third-party tools. To support these valuable contributions, we recommend working with our [releases](https://github.com/kohya-ss/musubi-tuner/releases) as stable reference points, as this project is under active development and breaking changes may occur.