[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth#13245

Merged

simon-mo merged 12 commits intovllm-project:mainfrom

lingfanyu:fast_vectorized_dma

Feb 21, 2025

Commits on Feb 14, 2025

helpfer func to reorder mask for DMA vectorization
lingfanyu
committed

Commits on Feb 18, 2025

Merge branch 'vllm-project:main' into fast_vectorized_dma
lingfanyu
authored