Skip to content

max_num_batched_tokens and max_num_seqs values #2492

@isRambler

Description

@isRambler

Hello, because I am new to vllm, I want to know how to set the max_num_batched_tokens and max_num_seqs values in order to achieve maximum inference performance. What is the relationship between max_num_batched_tokens and max_num_seqs? Why do the output tokens appear when I set different max_num_batched_tokens and max_num_seqs? The totals may be inconsistent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions