Skip to content

use fused multiply-add pointwise ops in chroma #8279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025

Conversation

drhead
Copy link
Contributor

@drhead drhead commented May 26, 2025

Micro-optimization for Chroma. Replaces ops with addcmul or Tensor.addcmul_ where appropriate. Possibly less readable but these ops are mathematically equivalent to what existed prior. I don't know the full details of how addcmul gets lowered but it should be more numerically stable as well (fma has infinite precision intermediates). Intended to try to mimic some of the performance gains that torch.compile gets through pointwise fusion -- this isn't the area with the most gains for that (that honor seems to go to torch's awful RMSNorm implementation that does like a dozen separate kernel launches in a row) but nevertheless doing this does reduce the amount of pointwise ops by a fair bit so it's worth doing. Triton itself also isn't always very reliable about taking opportunities to lower separate mul and add instructions into fma, so doing this should likely help torch.compile itself out a bit too.

For future reference for any further efforts here, applying @torch.compile to math.py:attention and rmsnorm.py:rms_norm (which I have confirmed in my case is using the native torch implementation) seems to yield most of the performance gains that compiling either the whole model or individual modules in chroma/layers.py yields, so those two probably could use some attention for potential gains across multiple models.

@drhead drhead requested a review from comfyanonymous as a code owner May 26, 2025 01:59
@comfyanonymous
Copy link
Owner

I have tried this before on the original flux model and it broke mps so I will have to check if they fixed that or just give up on mac support for this model.

@comfyanonymous comfyanonymous merged commit 08b7cc7 into comfyanonymous:master May 30, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants