Issue on flux dreambooth lora training

### Describe the bug

`RuntimeError: Input type (float) and bias type (c10::Half) should be the same`

whenever I follow the readme of flux https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md

I tried to do exactly it is written but still error

I tried the SDXL version it works but whenever for the Flux version it gives error

### Reproduction

`export MODEL_NAME="black-forest-labs/FLUX.1-dev"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="trained-flux-lora"

accelerate launch train_dreambooth_lora_flux.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --push_to_hub`

### Logs

```shell
/workspace/fluxdiff/lib/python3.10/site-packages/accelerate/accelerator.py:488: FutureWarning: `torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
08/21/2024 11:49:06 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type t5 to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 6626.07it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.43s/it]
Fetching 3 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 13039.29it/s]
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
08/21/2024 11:51:09 - INFO - __main__ - ***** Running training *****
08/21/2024 11:51:09 - INFO - __main__ -   Num examples = 5
08/21/2024 11:51:09 - INFO - __main__ -   Num batches each epoch = 5
08/21/2024 11:51:09 - INFO - __main__ -   Num Epochs = 250
08/21/2024 11:51:09 - INFO - __main__ -   Instantaneous batch size per device = 1
08/21/2024 11:51:09 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
08/21/2024 11:51:09 - INFO - __main__ -   Gradient Accumulation steps = 4
08/21/2024 11:51:09 - INFO - __main__ -   Total optimization steps = 500
Downloading shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8184.01it/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:20<00:00, 10.45s/it]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:20<00:00, 11.01s/itLoaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of black-forest-labs/FLUX.1-dev.                                                    | 0/7 [00:00<?, ?it/s]
Loaded tokenizer_2 as T5TokenizerFast from `tokenizer_2` subfolder of black-forest-labs/FLUX.1-dev.
                                                                                                                                                                     Loaded scheduler as FlowMatchEulerDiscreteScheduler from `scheduler` subfolder of black-forest-labs/FLUX.1-dev.                          | 2/7 [00:00<00:00,  6.04it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 20.95it/s]
08/21/2024 11:51:35 - INFO - __main__ - Running validation... 
 Generating 4 images with prompt: A photo of sks dog in a bucket.
Traceback (most recent call last):
  File "/workspace/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1857, in <module>
    main(args)
  File "/workspace/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 1780, in main
    images = log_validation(
  File "/workspace/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 188, in log_validation
    images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
  File "/workspace/diffusers/examples/dreambooth/train_dreambooth_lora_flux.py", line 188, in <listcomp>
    images = [pipeline(**pipeline_args, generator=generator).images[0] for _ in range(args.num_validation_images)]
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/diffusers/src/diffusers/pipelines/flux/pipeline_flux.py", line 769, in __call__
    image = self.vae.decode(latents, return_dict=False)[0]
  File "/workspace/diffusers/src/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 321, in decode
    decoded = self._decode(z).sample
  File "/workspace/diffusers/src/diffusers/models/autoencoders/autoencoder_kl.py", line 292, in _decode
    dec = self.decoder(z)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/diffusers/src/diffusers/models/autoencoders/vae.py", line 291, in forward
    sample = self.conv_in(sample)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 458, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
Steps:   0%|▍                                                                                                  | 2/500 [00:48<3:21:52, 24.32s/it, loss=0.485, lr=1e-5]
Traceback (most recent call last):
  File "/workspace/fluxdiff/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/workspace/fluxdiff/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
    simple_launcher(args)
  File "/workspace/fluxdiff/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/fluxdiff/bin/python3', 'train_dreambooth_lora_flux.py', '--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev', '--instance_data_dir=dog', '--output_dir=trained-flux-lora', '--mixed_precision=fp16', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=4', '--learning_rate=1e-5', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in a bucket', '--validation_epochs=25', '--seed=0', '--push_to_hub']' returned non-zero exit status 1.
```


### System Info

- 🤗 Diffusers version: 0.31.0.dev0
- Platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.12
- PyTorch version (GPU?): 2.4.0+cu121 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.24.6
- Transformers version: 4.44.1
- Accelerate version: 0.33.0
- PEFT version: 0.12.0
- Bitsandbytes version: not installed
- Safetensors version: 0.4.4
- xFormers version: not installed
- Accelerator: NVIDIA A100 80GB PCIe, 81920 MiB
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue on flux dreambooth lora training #9237

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue on flux dreambooth lora training #9237

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions