-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Describe the bug
I am trying to work on the flux lora quantization example as per the link
https://github.com/huggingface/diffusers/tree/main/examples/research_projects/flux_lora_quantization
but facing RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2 - error
Reproduction
Steps to reproduce:
python compute_embeddings.py
accelerate launch --config_file=accelerate.yaml
train_dreambooth_lora_flux_miniature.py
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \ # used flux-dev locally downloaded model
--data_df_path="embeddings.parquet"
--output_dir="yarn_art_lora_flux_nf4"
--mixed_precision="fp16"
--use_8bit_adam
--weighting_scheme="none"
--resolution=1024
--train_batch_size=1
--repeats=1
--learning_rate=1e-4
--guidance_scale=1
--report_to="wandb"
--gradient_accumulation_steps=4
--gradient_checkpointing
--lr_scheduler="constant"
--lr_warmup_steps=0
--cache_latents
--rank=4
--max_train_steps=700
--seed="0"
Logs
(env) root:~/tharun/Flux-HF# accelerate launch --config_file=accelerate.yaml \
train_dreambooth_lora_flux_miniature.py \
--pretrained_model_name_or_path="/root/tharun/black-forest-labs/FLUX.1-dev" \
--data_df_path="embeddings.parquet" \
--output_dir="yarn_art_lora_flux_nf4" \
--mixed_precision="fp16" \
--use_8bit_adam \
--weighting_scheme="none" \
--resolution=1024 \
--train_batch_size=1 \
--repeats=1 \
--learning_rate=1e-4 \
--guidance_scale=1 \
--report_to="wandb" \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--cache_latents \
--rank=4 \
--max_train_steps=700 \
--seed="0"
10/29/2024 16:58:01 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
Merged sharded checkpoints as `hf_quantizer` is not None.
{'axes_dims_rope'} was not found in config. Values will be initialized to default values.
Caching latents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [00:01<00:00, 10.35it/s]
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: tharunsivamani (tharunsivamani-student). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.18.5
wandb: Run data is saved locally in /root/tharun/Flux-HF/wandb/run-20241029_165827-66cke5nx
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run feasible-bird-3
wandb: ⭐️ View project at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4
wandb: 🚀 View run at https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
10/29/2024 16:58:28 - INFO - __main__ - ***** Running training *****
10/29/2024 16:58:28 - INFO - __main__ - Num examples = 18
10/29/2024 16:58:28 - INFO - __main__ - Num batches each epoch = 18
10/29/2024 16:58:28 - INFO - __main__ - Num Epochs = 140
10/29/2024 16:58:28 - INFO - __main__ - Instantaneous batch size per device = 1
10/29/2024 16:58:28 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
10/29/2024 16:58:28 - INFO - __main__ - Gradient Accumulation steps = 4
10/29/2024 16:58:28 - INFO - __main__ - Total optimization steps = 700
Steps: 0%| | 0/700 [00:00<?, ?it/s]Traceback (most recent call last):
File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1183, in <module>
main(args)
File "/root/tharun/Flux-HF/train_dreambooth_lora_flux_miniature.py", line 1072, in main
model_pred = transformer(
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/tharun/env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
return func(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 490, in forward
encoder_hidden_states, hidden_states = torch.utils.checkpoint.checkpoint(
File "/root/tharun/env/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
return disable_fn(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
return fn(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
ret = function(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 485, in custom_forward
return module(*inputs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/transformers/transformer_flux.py", line 175, in forward
attn_output, context_attn_output = self.attn(
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 495, in forward
return self.processor(
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1872, in __call__
query = apply_rotary_emb(query, image_rotary_emb)
File "/root/tharun/env/lib/python3.10/site-packages/diffusers/models/embeddings.py", line 770, in apply_rotary_emb
out = (x.float() * cos + x_rotated.float() * sin).to(x.dtype)
RuntimeError: The size of tensor a (4173) must match the size of tensor b (16461) at non-singleton dimension 2
wandb: 🚀 View run feasible-bird-3 at: https://wandb.ai/tharunsivamani-student/dreambooth-flux-dev-lora-nf4/runs/66cke5nx
wandb: Find logs at: wandb/run-20241029_165827-66cke5nx/logs
Traceback (most recent call last):
File "/root/tharun/env/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/root/tharun/env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/tharun/env/bin/python', 'train_dreambooth_lora_flux_miniature.py', '--pretrained_model_name_or_path=/root/tharun/black-forest-labs/FLUX.1-dev', '--data_df_path=embeddings.parquet', '--output_dir=yarn_art_lora_flux_nf4', '--mixed_precision=fp16', '--use_8bit_adam', '--weighting_scheme=none', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--learning_rate=1e-4', '--guidance_scale=1', '--report_to=wandb', '--gradient_accumulation_steps=4', '--gradient_checkpointing', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--cache_latents', '--rank=4', '--max_train_steps=700', '--seed=0']' returned non-zero exit status 1.System Info
- 🤗 Diffusers version: 0.32.0.dev0
- Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.12
- PyTorch version (GPU?): 2.5.0+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.24.7
- Transformers version: 4.46.1
- Accelerate version: 1.0.1
- PEFT version: 0.13.2
- Bitsandbytes version: 0.44.1
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA L40S, 46068 MiB
NVIDIA L40S, 46068 MiB - Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response