Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions .ai/context/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to developers when working with code in this reposit

## Project Overview

Musubi Tuner is a Python-based training framework for LoRA (Low-Rank Adaptation) models with multiple video generation architectures including HunyuanVideo, Wan2.1, FramePack, and FLUX.1 Kontext. The project focuses on memory-efficient training and inference for video generation models.
Musubi Tuner is a Python-based training framework for LoRA (Low-Rank Adaptation) models with multiple video generation architectures including HunyuanVideo, Wan2.1/2.2, FramePack, FLUX.1 Kontext and Qwen-Image. The project focuses on memory-efficient training and inference for video generation models.

## Installation and Environment

Expand Down Expand Up @@ -41,6 +41,9 @@ python src/musubi_tuner/fpack_train_network.py [similar args]

# FLUX.1 Kontext training
python src/musubi_tuner/flux_kontext_train_network.py [similar args]

# Qwen-Image training
python src/musubi_tuner/qwen_image_train_network.py [similar args]
```

### Inference Commands
Expand All @@ -53,6 +56,12 @@ python src/musubi_tuner/wan_generate_video.py [similar args]

# FramePack inference
python src/musubi_tuner/fpack_generate_video.py [similar args]

# FLUX.1 Kontext inference
python src/musubi_tuner/flux_kontext_generate_image.py --control_image_path path/to/control_image.png [similar args]

# Qwen-Image inference
python src/musubi_tuner/qwen_image_generate_image.py [similar args to FLUX.1 Kontext]
```

### Utility Commands
Expand Down Expand Up @@ -80,9 +89,10 @@ No formal test suite is present in this repository. The project relies on manual

### Architecture-Specific Modules
- `hunyuan_model/`: HunyuanVideo model implementation and utilities
- `wan/`: Wan2.1 model configurations and modules
- `wan/`: Wan2.1/2.2 model configurations and modules
- `frame_pack/`: FramePack model implementation and utilities
- `flux/`: FLUX model utilities
- `qwen_image/`: Qwen-Image model utilities

### Key Components
- **Dataset Configuration**: Uses TOML files for complex dataset setups supporting images, videos, control images, and metadata JSONL files
Expand Down
31 changes: 4 additions & 27 deletions README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ Wan2.1/2.2については、[Wan2.1/2.2のドキュメント](./docs/wan.md)も

- GitHub Discussionsを有効にしました。コミュニティのQ&A、知識共有、技術情報の交換などにご利用ください。バグ報告や機能リクエストにはIssuesを、質問や経験の共有にはDiscussionsをご利用ください。[Discussionはこちら](https://github.com/kohya-ss/musubi-tuner/discussions)

- 2025/08/22
- Qwen-Image-Editに対応しました。PR [#473](https://github.com/kohya-ss/musubi-tuner/pull/473) 詳細は[Qwen-Imageのドキュメント](./docs/qwen_image.md)を参照してください。変更が多岐に渡るため既存機能へ影響がある可能性があります。不具合が発生した場合は、[Issues](https://github.com/kohya-ss/musubi-tuner/issues)でご報告ください。
- **破壊的変更**: この変更に伴いFLUX.1 Kontextのキャッシュフォーマットが変更されました。Latentキャッシュを再作成してください。

- 2025/08/18
- `qwen_image_train_network.py`の訓練時の`--network_module networks.lora_qwen_image`の指定について、ドキュメントへの記載が漏れていました。[ドキュメント](./docs/qwen_image.md#training--学習)を修正しました。

Expand All @@ -61,33 +65,6 @@ Wan2.1/2.2については、[Wan2.1/2.2のドキュメント](./docs/wan.md)も
- 2025/08/15
- Timestep Bucketing機能が追加されました。これにより、タイムステップの分布がより均一になり、学習が安定します。PR [#418](https://github.com/kohya-ss/musubi-tuner/pull/418) 詳細は[Timestep Bucketingのドキュメント](./docs/advanced_config.md#timestep-bucketing-for-uniform-sampling--均一なサンプリングのためのtimestep-bucketing)を参照してください。

- 2025/08/14
- `convert_lora.py`がQwen-ImageのLoRAをサポートしました。PR [#444](https://github.com/kohya-ss/musubi-tuner/pull/444) Diffusers形式との相互変換が可能です。詳細は[LoRAの形式の変換](#loraの形式の変換)を参照してください。

- 2025/08/11
- `--timestep_sampling`に`qwen_shift`が追加されました。これはQwen-Imageの推論時と同じ方法で、各画像の解像度に基づいた動的シフト値を使用します。またこれに伴い`qinglong`は`qinglong_flux`と`qinglong_qwen`に分割されました。PR [#428](https://github.com/kohya-ss/musubi-tuner/pull/428) sdbds氏に感謝します。詳細は[Qwen-Imageのドキュメント](./docs/qwen_image.md#timestep_sampling--タイムステップのサンプリング)および[高度な設定](./docs/advanced_config.md#style-friendly-snr-sampler)を参照してください。
- `wan_generate_video.py` でWan2.2のhigh/lowモデルを使用するときに、遅延読み込みを行う`--lazy_loading`オプションを追加しました。PR [#427](https://github.com/kohya-ss/musubi-tuner/pull/427) 詳細は[こちら](./docs/wan.md#inference--推論)を参照してください。

- 2025/08/10
- Qwen-Imageに対応しました。PR [#408](https://github.com/kohya-ss/musubi-tuner/pull/408) 詳細は[Qwen-Imageのドキュメント](./docs/qwen_image.md)を参照してください。

- 2025/08/09
- wandbにログ出力用設定しているとき、サンプル生成画像もwandbに出力されるようになりました。PR [#420](https://github.com/kohya-ss/musubi-tuner/pull/420) xhiroga 氏に感謝します。

- 2025/08/08
- Wan2.2に対応しました。PR [#399](https://github.com/kohya-ss/musubi-tuner/pull/399) 詳細は[Wan2.1/2.2のドキュメント](./docs/wan.md)を参照してください。

Wan2.2はhigh noiseとlow noiseの二つのモデルから構成され、LoRAの学習時にどちらか一方、または両方を選択することができます。それに伴いtimestepの指定が必要になりますので、ドキュメントをご確認ください。

- 2025/08/07
- タイムステップのサンプリングに新しく `logsnr` と `qinglong` のサンプリング手法を追加しました。PR [#407](https://github.com/kohya-ss/musubi-tuner/pull/407) でsdbds氏により提案されました。sdbds氏に感謝します。logsnrはスタイルの学習に特化し、qinglongはスタイル学習、モデルの安定性、ディテールの再現性を考慮したハイブリッドサンプリング手法です。詳細は[こちらのドキュメント](./docs/advanced_config.md#style-friendly-snr-sampler)を参照してください。

- 2025/08/02
- `--fp8_scaled`を指定したときのFramePack、Wan2.1のモデル読み込みのピークメモリ使用量を削減しました。これにより、学習、推論前のモデル読み込み時のVRAM使用量が削減されます。

- 2025/08/01
- FLUX. KontextのLoRA学習でblock swapが動作しない不具合を修正しました。[PR #402](https://github.com/kohya-ss/musubi-tuner/pull/402) および [PR #403](https://github.com/kohya-ss/musubi-tuner/pull/403) sdbds氏に感謝します。

### リリースについて

Musubi Tunerの解説記事執筆や、関連ツールの開発に取り組んでくださる方々に感謝いたします。このプロジェクトは開発中のため、互換性のない変更や機能追加が起きる可能性があります。想定外の互換性問題を避けるため、参照用として[リリース](https://github.com/kohya-ss/musubi-tuner/releases)をお使いください。
Expand Down
34 changes: 4 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@ If you find this project helpful, please consider supporting its development via

- GitHub Discussions Enabled: We've enabled GitHub Discussions for community Q&A, knowledge sharing, and technical information exchange. Please use Issues for bug reports and feature requests, and Discussions for questions and sharing experiences. [Join the conversation →](https://github.com/kohya-ss/musubi-tuner/discussions)

- August 22, 2025:
- Qwen-Image-Edit support has been added. See PR [#473](https://github.com/kohya-ss/musubi-tuner/pull/473) and the [Qwen-Image documentation](./docs/qwen_image.md) for details. This change may affect existing features due to its extensive nature. If you encounter any issues, please report them in the [Issues](https://github.com/kohya-ss/musubi-tuner/issues).
- **Breaking Change**: The cache format for FLUX.1 Kontext has been changed with this update. Please recreate the latent cache.

- August 18, 2025:
- The option `--network_module networks.lora_qwen_image` was missing from the documentation for training with `qwen_image_train_network.py`. The [documentation](./docs/qwen_image.md#training--学習) has been fixed to include this information.

Expand All @@ -70,36 +74,6 @@ If you find this project helpful, please consider supporting its development via
- August 15, 2025:
- The Timestep Bucketing feature has been added, which allows for a more uniform distribution of timesteps and stabilizes training. See PR [#418](https://github.com/kohya-ss/musubi-tuner/pull/418) and the [Timestep Bucketing documentation](./docs/advanced_config.md#timestep-bucketing-for-uniform-sampling--均一なサンプリングのためのtimestep-bucketing) for details.

- August 14, 2025:
- `convert_lora.py` now supports conversion for Qwen-Image LoRA models with Diffusers format. PR [#444](https://github.com/kohya-ss/musubi-tuner/pull/444) See [here](#convert-lora-to-another-format) for more details.

- August 11, 2025:
- Added `--timestep_sampling` option with `qwen_shift`. This uses the same method as during inference for Qwen-Image, employing dynamic shift values based on the resolution of each image (typically around 2.2 for 1328x1328 images). Additionally, `qinglong` has been split into `qinglong_flux` and `qinglong_qwen`. Thanks to sdbds for [PR #428](https://github.com/kohya-ss/musubi-tuner/pull/428).

For details, see the [Qwen-Image documentation](./docs/qwen_image.md#timestep_sampling--タイムステップのサンプリング) and [Advanced Configuration](./docs/advanced_config.md#style-friendly-snr-sampler).

- Added `--lazy_loading` option for delayed loading of DiT models when using Wan2.2 high/low models in `wan_generate_video.py`. [PR #427](https://github.com/kohya-ss/musubi-tuner/pull/427) See [Wan2.2 documentation](./docs/wan.md#inference--推論) for details.

- August 10, 2025:
- Added support for Qwen-Image. See [Qwen-Image documentation](./docs/qwen_image.md) for details.

- August 9, 2025:
- When logging to wandb, sample generation images are now also logged to wandb. Thanks to xhiroga for [PR #420](https://github.com/kohya-ss/musubi-tuner/pull/420).

- August 8, 2025:
- Added support for Wan2.2. [PR #399](https://github.com/kohya-ss/musubi-tuner/pull/399). See [Wan2.1/2.2 documentation](./docs/wan.md).

Wan2.2 consists of two models: high noise and low noise. During LoRA training, you can choose either one or both. Please refer to the documentation for details on specifying timesteps.

- August 7, 2025:
- Added new sampling methods for timesteps: `logsnr` and `qinglong`. Thank you to sdbds for proposing this in [PR #407](https://github.com/kohya-ss/musubi-tuner/pull/407). `logsnr` is designed for style learning, while `qinglong` is a hybrid sampling method that considers style learning, model stability, and detail reproduction. For details, see the [Style-friendly SNR Sampler documentation](./docs/advanced_config.md#style-friendly-snr-sampler).

- August 2, 2025:
- Reduced peak memory usage during model loading for FramePack and Wan2.1 when using `--fp8_scaled`. This reduces VRAM usage during model loading before training and inference.

- August 1, 2025:
- Fixed the issue where block swapping did not work in FLUX. Kontext LoRA training. Thanks to sdbds for [PR #402](https://github.com/kohya-ss/musubi-tuner/pull/402). [PR #403](https://github.com/kohya-ss/musubi-tuner/pull/403).

### Releases

We are grateful to everyone who has been contributing to the Musubi Tuner ecosystem through documentation and third-party tools. To support these valuable contributions, we recommend working with our [releases](https://github.com/kohya-ss/musubi-tuner/releases) as stable reference points, as this project is under active development and breaking changes may occur.
Expand Down
Loading