- 
                Notifications
    You must be signed in to change notification settings 
- Fork 68
feat: add fastercache and pab #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Mostly small comments before ti can be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I readded my comments that have been dropped in the last review ;)
| """ | ||
| imported_modules = self.import_algorithm_packages() | ||
| # set default values | ||
| temporal_attention_block_skip_range: Optional[int] = None | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still recommend to put them in the smash config as constant and mention that these can be overwritten for different architecture with a link to the code file or the diffuser PR so that the documentation is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The best solution would be to use the pipeline-specific defaults unless the user explicitly specifies a parameter. Unfortunately, our SmashConfig interface doesn’t support that logic right now. To implement it, we’d have to apply the same defaults across every pipeline. Given the large number of parameters—and the fact that most aren’t straightforward to tune—I’ll leave things as they are for now. This approach ensures users receive strong out-of-the-box results and a straightforward interface, albeit with fewer tuning options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. As discussed async, let's make clear in the PR descripiton that we have a pipeline specific default in this PR that would need some iterations :)
| temporal_attention_block_skip_range: Optional[int] = None | ||
| spatial_attention_timestep_skip_range: Tuple[int, int] = (-1, 681) | ||
| temporal_attention_timestep_skip_range: Optional[Tuple[int, int]] = None | ||
| low_frequency_weight_update_timestep_range: Tuple[int, int] = (99, 901) | ||
| high_frequency_weight_update_timestep_range: Tuple[int, int] = (-1, 301) | ||
| unconditional_batch_skip_range: int = 5 | ||
| unconditional_batch_timestep_skip_range: Tuple[int, int] = (-1, 641) | ||
| spatial_attention_block_identifiers: Tuple[str, ...] = ( | ||
| "blocks.*attn1", | ||
| "transformer_blocks.*attn1", | ||
| "single_transformer_blocks.*attn1" | ||
| ) | ||
| temporal_attention_block_identifiers: Tuple[str, ...] = ("temporal_transformer_blocks.*attn1",) | ||
| attention_weight_callback = lambda _: 0.5 # noqa: E731 | ||
| tensor_format: str = "BFCHW" | ||
| is_guidance_distilled: bool = False | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still recommend to put them in the smash config as constant and mention that these can be overwritten for different architecture with a link to the code file or the diffuser PR so that the documentation is complete :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go!
* fix: correct docstring in deepcache * feat: add model checks * feat: add pyramid attention broadcast (pab) cacher * feat: add fastercache cacher * tests: add flux tiny random fixture * tests: add algorithms tests for pab and fastercache * tests: add combination tests for pab and fastercache * fix: add 1 as value for interval parameter
Description
This PR adds the PAB and FasterCache algorithms from diffusers (https://huggingface.co/docs/diffusers/main/api/cache).
Type of Change
How Has This Been Tested?
I manually tested both caching mechanisms across all supported Diffusers pipelines, visually inspected the resulting images, videos, and measured the inference time (relative speedups match this benchmark. For FLUX, I evaluated every supported combination of algorithms. I implemented new tests for each algorithm and all of their combinations with other algorithms.
Checklist
Additional Notes
I tried to make these cachers work with compilation but ran into errors. The main problem is that these methods introduce a condition for the attention layer, which hinders compilation.