Skip to content

Commit 1e3598e

Browse files
authored
Use the optimized block sizes after tuning the kernel. (#14329)
1 parent f7a6bd0 commit 1e3598e

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm/v1/attention/backends/pallas.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
from vllm.attention.backends.utils import CommonAttentionState
1313

1414
# These are the 2 tunable parameters of the paged attention Pallas kernel.
15-
NUM_QUERIES_PER_BLOCK = 32
16-
NUM_KV_PAGES_PER_BLOCK = 128
15+
NUM_QUERIES_PER_BLOCK = 16
16+
NUM_KV_PAGES_PER_BLOCK = 256
1717

1818

1919
class PallasAttentionBackend(AttentionBackend):

0 commit comments

Comments
 (0)