-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Closed
Description
Hello, because I am new to vllm, I want to know how to set the max_num_batched_tokens and max_num_seqs values in order to achieve maximum inference performance. What is the relationship between max_num_batched_tokens and max_num_seqs? Why do the output tokens appear when I set different max_num_batched_tokens and max_num_seqs? The totals may be inconsistent
xiaobanni, Columpio, minicokr, matthew-at-qamcom, zengqingfu1442 and 30 moreYouthquake123jaehwlee, zengqingfu1442, allanj and Youthquake123
Metadata
Metadata
Assignees
Labels
No labels