Skip to content

Preliminar support for sageattention3 #9047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Panchovix
Copy link

@Panchovix Panchovix commented Jul 25, 2025

Add preliminary support to sage attention 3 (https://github.com/thu-ml/SageAttention and https://huggingface.co/jt-zhang/SageAttention3) for attention.

Help in implementation is welcome, as when testing on txt2img and img2img on SDXL on a RTX 5090, on Linux, I'm getting ~10% slower speeds vs sage attention 2, so probably I'm missing something here. EDIT: Thanks to Kijai I have updated per_block_mean to False and it should make a difference.

If someone can test with video it would be great.

Also maybe it would be a good idea to have different flags on comfy.model_management and comfy.cli_args for the 2 different versions?

Not sure what are the implications of this, but it seems to make sage3 actually run,
@comfyanonymous
Copy link
Owner

I don't feel like filling out approval forms so I'll wait until it's public before testing this.

For the line ending check if you sync master it should be fixed.

@pamparamm
Copy link
Contributor

@Panchovix I agree with your last point: it's probably better to separate sageattention and sageattention3 into different flags. as they have different signatures and expect different tensor shapes as well.
Also, I'm getting worse performance with sageattention3 compared to sageattention2 on SDXL on Windows, and switching per_block_mean is not helping (I'm using MSVC-compatible version from https://huggingface.co/jt-zhang/SageAttention3/discussions/5).

@Panchovix
Copy link
Author

Okay I have separated the sage attention versions with different flags:

  • --use-sage-attention for sage 1.x/2.x
  • --use-sage-attention3 for sage 3.x

Now they work separately (so you can have both installed in the venv and use the one you want).

I also still get less performance on SDXL for some reason. But I'm not sure if my implementation is not correct and that may cause that perf regression.

Kijai seems to have better performance on video, as he mentioned on https://huggingface.co/jt-zhang/SageAttention3/discussions/3#6883b71543d2651d281c9cc0, but it seems to be a custom implementation from here kijai/ComfyUI-WanVideoWrapper@a35eb7d.

Any ideas are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants