
dLLM.decoding.demo.mov
- [2025-10-14] Dream integration coming soon!

Extremely Greedy Parallel strategy: compares the predicted tokens with the reference answer and only remasks the tokens that do not match in these comparisons.
Use a trained filter

Upon detection of an
Experiments on GSM8K, MATH, HumanEval, and MBPP show that our approach significantly improves throughput (by up to 22.58 times faster) while maintaining model accuracy, demonstrating outstanding generalization and practicality. Each method was evaluated using two generation lengths (256 and 1024) across four datasets. Performance is measured using three metrics: TPS (tokens/sec), speedup, and accuracy score. The highest throughput and speedup values for each configuration are highlighted in bold.
- Install dependencies
pip install -r requirements.txt
- Run the program
- Test single questions
python generate.py
- Run evaluations
./eval_llada.sh
./generate_training_data.sh
You can directly use training.ipynb
to train new filter models with your own datasets.
We would like to thank the authors of LLaDA and Fast-dLLM for their excellent work and open-source contributions.
If you find our work useful, please consider citing our paper.
@misc{bao2025learningparallelacceleratingdiffusion,
title={Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding},
author={Wenrui Bao and Zhiben Chen and Dan Xu and Yuzhang Shang},
year={2025},
eprint={2509.25188},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.25188},
}