Skip to content

ims-kdks/Learning-to-Parallel-Decoding

Repository files navigation

Learning to Parallel Decoding

arXiv arXiv

introduction
dLLM.decoding.demo.mov

🔥News

  • [2025-10-14] Dream integration coming soon!

💡Methods

1. Learning to Parallel Decoding

overview

Extremely Greedy Parallel strategy: compares the predicted tokens with the reference answer and only remasks the tokens that do not match in these comparisons. Use a trained filter $f_\theta$ that simulate the Extremely Greedy Parallel strategy after each decoding step to select tokens and decide whether to remask them.

2. End-of-Text Prediction

eot

Upon detection of an $[EoT]$ token, we throw away all the tokens after the $[EoT]$ token in the next diffusion step. When the specified output length is very long (for example, 1024), this method can significantly reduce computation by dynamically reducing the input size during the diffusion process.

🏎️Performance

Experiments on GSM8K, MATH, HumanEval, and MBPP show that our approach significantly improves throughput (by up to 22.58 times faster) while maintaining model accuracy, demonstrating outstanding generalization and practicality. Each method was evaluated using two generation lengths (256 and 1024) across four datasets. Performance is measured using three metrics: TPS (tokens/sec), speedup, and accuracy score. The highest throughput and speedup values for each configuration are highlighted in bold.

performance

How to run

  1. Install dependencies
pip install -r requirements.txt
  1. Run the program
    1. Test single questions
    python generate.py
    
    1. Run evaluations
    ./eval_llada.sh
    

Generate data for training

./generate_training_data.sh

Training Filter

You can directly use training.ipynb to train new filter models with your own datasets.

Acknowledgments

We would like to thank the authors of LLaDA and Fast-dLLM for their excellent work and open-source contributions.

Citation

If you find our work useful, please consider citing our paper.

@misc{bao2025learningparallelacceleratingdiffusion,
      title={Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding}, 
      author={Wenrui Bao and Zhiben Chen and Dan Xu and Yuzhang Shang},
      year={2025},
      eprint={2509.25188},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.25188}, 
}

About

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •