Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py

I haven't fully fixed the script, but I'm really not sure how anyone has had success with it (eg the [blog](https://huggingface.co/blog/linoyts/new-advanced-flux-dreambooth-lora) post). Many show stoppers when trying to get Textual Inversion working.

To debug, I started peeling away at [the main Flux training script](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py), and have fixed all the following bugs, BUT it still doesn't train well. With any hyperparams, it fails to learn a new concept, even to overfit on it, and quickly degrades. So the following are necessary to be fixed, but not sufficient to get this working.

1. On a 1-node-2-gpu setup, `accelerator` wraps everything in a `DistributedSomething` class, so all the calls to `model.dtype`/`model.config` etc error out, since they're wrapped, and you actually need something like `model.module.dtype`
2. it requires an `--instance-prompt` even if you're using a dataset with a custom text column
3. The updated tokenizers are [not getting passed](https://github.com/huggingface/diffusers/blob/41ba8c0bf6b3dc3ebd0fa6b96ecf671fa4171566/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py#L2364) into the pipeline
4. text_embeddings get saved weird, it looks like a single copy gets saved no matter the checkpoint number?
5. in `log_validation` there's a dtype mismatch runtime error when training in `fp16/bf16`, and you need to autocast
6. also in `log_validation` the `Generator` doesn't seem to do anything. Every generation gets performed on a different seed so you can't see the evolution/performance working on the same samples. This is remedied if you save the RNG state of torch/torch_cuda/random, then use the RNG `Generator`, then reset the random state afterward
7. `t5` training doesn't work. On [this line](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py#L1752) it unfreezes `token_embedding` which doesn't exist in t5. `shared` needs to be unfrozen.
8. Just a weird one, pulling the embeddings toward [`std**0.1`](https://github.com/huggingface/diffusers/blob/main/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py#L925) makes intuitive sense, but this kind of thing should definitely mention justification. Was this done in a paper somewhere?

Some nice to haves that I wish this did out of the box (but would be happy if it the simple path above "just worked"):

* save latent cache to disk
* aspect ratio bucketing
* 8bit backbone

Honestly, at this point i'm tempted to give up on `diffusers` and use something else for this current client, after sweating over fixing the above, i still haven't gotten it to work, and still haven't been able to stress test things like the dreambooth functionality, or the whole-text-encoder finetuning feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple bugs in flux dreambooth script: train_dreambooth_lora_flux_advanced.py #10313

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions