-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Description
Describe the bug
When prompting the FluxPipeline class to generate an image with shape (1920, 1080), the output image shape is rounded to (1920, 1072) which to me seems like the nearest multiple of 16 instead of 8.
As the FluxPipeline class accepts input sizes divisible by 8 I would expect them to remain consistent throught the generation process.
By giving a quick look at the code it seems that in the FluxPipeline._unpack_latents method, the height and width are floor divided (//) by the vae_scale_factor which is 16.
I would love to understand why the scale factor is set like the following:
https://github.com/huggingface/diffusers/blob/89e4d6219805975bd7d253a267e1951badc9f1c0/src/diffusers/pipelines/flux/pipeline_flux.py#L197C9-L199C10
Reproduction
Here is the minimal code to reproduce the bug, feel free to change the number of inference steps as it should not influence the scope of the test.
from diffusers.pipelines import FluxPipeline
import torch
bf_repo = "black-forest-labs/FLUX.1-dev"
prompt = "Astronaut drinking coffe on the moon."
shape = (1920, 1080)
pipe = FluxPipeline.from_pretrained(bf_repo, torch_dtype=torch.bfloat16)
pipe.enable_sequential_cpu_offload()
image = pipe(
prompt,
height=shape[1],
width=shape[0],
num_inference_steps=28,
generator=torch.Generator('cpu').manual_seed(123)
).images[0]
print(f"Prompted shape: {shape}")
print(f"Generated shape: {image.size}")
image.show()Logs
Prompted shape: (1920, 1080)
Generated shape: (1920, 1072)System Info
- 🤗 Diffusers version: 0.31.0
- Platform: Windows-10-10.0.22631-SP0
- Running on Google Colab?: No
- Python version: 3.10.6
- PyTorch version (GPU?): 2.4.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.44.2
- Accelerate version: 0.34.2
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA GeForce RTX 4070 Laptop GPU, 8188 MiB
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no