use fused multiply-add pointwise ops in chroma #8279

drhead · 2025-05-26T01:59:57Z

Micro-optimization for Chroma. Replaces ops with addcmul or Tensor.addcmul_ where appropriate. Possibly less readable but these ops are mathematically equivalent to what existed prior. I don't know the full details of how addcmul gets lowered but it should be more numerically stable as well (fma has infinite precision intermediates). Intended to try to mimic some of the performance gains that torch.compile gets through pointwise fusion -- this isn't the area with the most gains for that (that honor seems to go to torch's awful RMSNorm implementation that does like a dozen separate kernel launches in a row) but nevertheless doing this does reduce the amount of pointwise ops by a fair bit so it's worth doing. Triton itself also isn't always very reliable about taking opportunities to lower separate mul and add instructions into fma, so doing this should likely help torch.compile itself out a bit too.

For future reference for any further efforts here, applying @torch.compile to math.py:attention and rmsnorm.py:rms_norm (which I have confirmed in my case is using the native torch implementation) seems to yield most of the performance gains that compiling either the whole model or individual modules in chroma/layers.py yields, so those two probably could use some attention for potential gains across multiple models.

comfyanonymous · 2025-05-26T18:36:45Z

I have tried this before on the original flux model and it broke mps so I will have to check if they fixed that or just give up on mac support for this model.

use fused multiply-add pointwise ops in chroma

001a4d7

drhead requested a review from comfyanonymous as a code owner May 26, 2025 01:59

comfyanonymous merged commit 08b7cc7 into comfyanonymous:master May 30, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use fused multiply-add pointwise ops in chroma #8279

use fused multiply-add pointwise ops in chroma #8279

Uh oh!

drhead commented May 26, 2025 •

edited

Loading

Uh oh!

comfyanonymous commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

use fused multiply-add pointwise ops in chroma #8279

use fused multiply-add pointwise ops in chroma #8279

Uh oh!

Conversation

drhead commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

comfyanonymous commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

drhead commented May 26, 2025 •

edited

Loading