[FT] Propagate batch size control for vLLM backend

## Issue encountered

With vLLM backend, currently there's no way for us to control the batch size defined in [here](https://github.com/huggingface/lighteval/blob/main/src/lighteval/main_vllm.py#L125) and the vLLM model [config](https://github.com/huggingface/lighteval/blob/main/src/lighteval/models/vllm/vllm_model.py#L77) does not have ways to determine a specific batch size. However, we can control the maximum number of sequences (batch size) in vLLM directly from examples such as [this](https://github.com/vllm-project/vllm/blob/30172b4947c52890b808c6da3a6c7580f55cbb74/examples/offline_inference/neuron.py#L18).

## Solution/Feature
- Propagate the `max_num_seqs` parameter into the initialization of the vLLM model.

## Possible alternatives
- Other alternatives are to implement batching ourselves, which is an overkill since the vLLM backend already supports that.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Propagate batch size control for vLLM backend #573

Issue encountered

Solution/Feature

Possible alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Propagate batch size control for vLLM backend #573

Description

Issue encountered

Solution/Feature

Possible alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions