Sana activations explode / clamping issue

### Describe the bug

I'm using the pretrained weights from `Efficient-Large-Model/Sana_1600M_1024px_diffusers`. I don't know if this is an issue with these weights, or if the implementation is broken.

Things I've observed so far:
- using fp16 calculations usually generates good enough results
- setting everything to fp32 (weights and autocast contexts) completely breaks the output

The attention output [here](https://github.com/huggingface/diffusers/blob/bf9a641f1a51368af5f3ae99cc460107d4fa2103/src/diffusers/models/transformers/sana_transformer.py#L160) is very different between fp16 the fp32 version.

The `hidden_states` are in the `+/-5*10^5` range [here](https://github.com/huggingface/diffusers/blob/bf9a641f1a51368af5f3ae99cc460107d4fa2103/src/diffusers/models/attention_processor.py#L5767) (sometimes even higher, I've seen values as high as `1.3*10^6`).
Using fp16 calculations, they become inf, which is clamped down to (-65504, 65504) (or about `6*10^4`, more than an order of magnitude less). Using fp32 calculations, this clamping is not done, which means the output of that attention block is also different.

Enabling this clamping even for fp32 calculations fixes the issue, but this seems like a hack. That clamping operation looks like a safeguard, not like an essential part of the attention calculations. Adding `print(f"hidden_states: {hidden_states}")` just before and after the clamping operation shows the issue pretty well. You can see 

Here are some examples (all using the same prompt/seed/cfg/sampler/etc.)
```python
import torch
from diffusers import SanaPipeline

if __name__ == '__main__':
    generator = torch.Generator(device="cuda")
    generator.manual_seed(42)

    pipe = SanaPipeline.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers")
    pipe.to("cuda")
    pipe.text_encoder.to(torch.bfloat16)
    pipe.transformer = pipe.transformer.to(torch.float32) # <--- change the dtype here

    image = pipe(
        prompt='a water color painting of a bear',
        complex_human_instruction=None,
        generator=generator,
    )[0]
    image[0].save("debug/output.png")
```

fp16 weights (with clamping)
![fp16-clamped](https://github.com/user-attachments/assets/fefd28d0-7a7f-4b67-8f66-d549eebb6368)

fp32 weights (without clamping)
![fp32-not-clamped](https://github.com/user-attachments/assets/63741a5a-0873-4f6b-9fe8-d5a6e14c555b)

fp32 weights (with clamping)
![fp32-clamped](https://github.com/user-attachments/assets/49b235af-d88c-4089-ab83-646c2064204b)

(tagging @lawrence-cj as the original author)

### Reproduction

```python
import torch
from diffusers import SanaPipeline

if __name__ == '__main__':
    generator = torch.Generator(device="cuda")
    generator.manual_seed(42)

    pipe = SanaPipeline.from_pretrained("Efficient-Large-Model/Sana_1600M_1024px_diffusers")
    pipe.to("cuda")
    pipe.text_encoder.to(torch.bfloat16)
    pipe.transformer = pipe.transformer.to(torch.float32) # <--- change the dtype here

    image = pipe(
        prompt='a water color painting of a bear',
        complex_human_instruction=None,
        generator=generator,
    )[0]
    image[0].save("debug/output.png")
```

### Logs

_No response_

### System Info

- 🤗 Diffusers version: 0.32.0.dev0
- Platform: Windows-10-10.0.22631-SP0
- Running on Google Colab?: No
- Python version: 3.10.8
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.26.2
- Transformers version: 4.47.0
- Accelerate version: 1.0.1
- PEFT version: not installed
- Bitsandbytes version: 0.44.1
- Safetensors version: 0.4.5
- xFormers version: 0.0.28.post3
- Accelerator: NVIDIA RTX A5000, 24564 MiB
- Using GPU in script?: CUDA / NVIDIA RTX A5000
- Using distributed or parallel set-up in script?: No

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sana activations explode / clamping issue #10336

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sana activations explode / clamping issue #10336

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions