[CVPR 2025] Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration

🔥 News

2025/09/28 🚀🚀 We update an improved version that integrates seamlessly with the diffusers StableDiffusion3Pipeline, requiring no modifications to the original diffusers code. This version removes the dependency on attention maps and is fully compatible with xFormers.

Dependencies

Python>=3.9
CUDA>=11.8

🛠 Installation

git clone https://github.com/ICTMCG/SDTM.git

Environment Settings

Models and Datasets

We evaluated our model based on the Hugging Face diffusers library. You can download the related models and datasets from the following links:

Links:

Name	urls
COCO2017	http://images.cocodataset.org
PartiPrompts	https://github.com/google-research/parti
stabilityai/stable-diffusion-3-medium	https://huggingface.co/stabilityai/stable-diffusion-3-medium
stabilityai/stable-diffusion-3.5-large	https://huggingface.co/stabilityai/stable-diffusion-3.5-large
stabilityai/stable-diffusion-3.5-large-turbo	https://huggingface.co/stabilityai/stable-diffusion-3.5-large-turbo

Besides, we provide a replica for our environment here:

Environmetns (recommended)

cd SDTM
conda env create -f environment-sdtm.yml

🚀 Demo and Inference

Run DiT-ToCa

DDPM-250 Steps

sample images for visualization

bash demo.sh

sample images for evaluation

python sample.py \
  --caption-path "longest_captions.json" \
  --model-path "../../checkpoints/StableDiffusion/stable-diffusion-3-medium-diffusers" \
  --output-path "samples" \
  --height 1024 --width 1024 \
  --num_inference_steps 50 --guidance-scale 7.0 \
  --batch-size 4 --seed 0 \
  --tore-type SDTM

multi-GPU image sampling for evaluation

torchrun --nproc_per_node=4 sample_ddp.py \
  --caption-path "longest_captions.json" \
  --model-path "../../checkpoints/StableDiffusion/stable-diffusion-3-medium-diffusers" \
  --output-path "samples" \
  --height 1024 --width 1024 \
  --num_inference_steps 50 --guidance-scale 7.0 \
  --batch-size 4 --seed 0 \
  --tore-type SDTM

👍 Acknowledgements

Thanks to diffusers for their excellent work and the codebase upon which we build SDTM.
Thanks to ToMeSD for their contribution of the base token merging method.
Thanks to ALGM for their work, which inspired our structure-then-detail token merging approach.

📌 Citation

@inproceedings{fang2025attend,
  title={Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration},
  author={Fang, Haipeng and Tang, Sheng and Cao, Juan and Zhang, Enshuo and Tang, Fan and Lee, Tong-Yee},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={18083--18092},
  year={2025}
}

📧 Contact

If you have any questions, please email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
TR_SDTM.py		TR_SDTM.py
TR_ToMe.py		TR_ToMe.py
demo.sh		demo.sh
environment-sdtm.yml		environment-sdtm.yml
sample.py		sample.py
sample_ddp.py		sample_ddp.py
sample_demo.py		sample_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR 2025] Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration

🔥 News

Dependencies

🛠 Installation

Environment Settings

Models and Datasets

Environmetns (recommended)

🚀 Demo and Inference

Run DiT-ToCa

DDPM-250 Steps

👍 Acknowledgements

📌 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Languages

ICTMCG/SDTM

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration

🔥 News

Dependencies

🛠 Installation

Environment Settings

Models and Datasets

Environmetns (recommended)

🚀 Demo and Inference

Run DiT-ToCa

DDPM-250 Steps

👍 Acknowledgements

📌 Citation

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages