Awarno/haproxy #241

AWarno · 2025-09-29T11:45:15Z

Enabling Multi-Instance Deployment with HAProxy

Why HAProxy

HAProxy is a lightweight, reliable, and widely used load balancer. It generalizes well to all server types. Using an external load balancer is officially recommended in the vLLM documentation (see vLLM Data Parallel Deployment); the documentation provides an example using NGINX, but HAProxy should work similarly.

Alternative Solutions

Ray
This is useful for multi-node deployments when a model is too large for a single node. It can also be used for multi-instance setups, but it requires knowing how to launch and manage each server type individually (vLLM, SGLang may have different CLI arguments for this). It does not generalize as well as using an external load balancer. However, we may want to provide an example of how to use it for multi-node large model deployment.
LiteLLM
Offers backend orchestration but is generally overkill for simple load balancing. The project evolves quickly, which may affect stability.
NGINX
Very similar to HAProxy for this use case and officially recommended in the vLLM documentation:
vLLM Data Parallel Deployment
HAProxy, however, is slightly simpler/nicer to use in practice (based on my experience).

Literature

TODO

Run on longer tasks to validate stability and performance. (I have checked ifeval so far)
Check if the HAProxy template is correctly included in the pip wheel (consider renaming it)
Documentation
dataclass in types fix!!!!

Next Steps

Add a multi-node deployment example using Ray server. This will likely just require creating one example configuration file under examples/.

copy-pr-bot · 2025-09-29T11:45:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

1. Add total stats. 2. Add reasoning token stats (if provided). - https://platform.openai.com/docs/guides/reasoning or "reasoning_tokens" in usage, (completion_tokens_details, output_tokens_details) 3. Make stats cache-resistant — do not include stats if the response is from cache. --------- Signed-off-by: Anna Warno <[email protected]>

checkbox added Signed-off-by: AWarno <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

It unblocks us to use new Eval Factory containers in the launcher — they don't have `nv-eval`/`nv_eval` alias anymore. Signed-off-by: Piotr Januszewski <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

This is a very basic migration of the readme content + adding a minimal toctree to the home index page so that the sphinx site produces a sidebar. The sidebar will mature and break out in the future into sections such as About, Get Started, etc. We will also add more sections/cards to this page after all other basic edits have been checked in, so it won't be a direct copy of the README, instead it will become a proper docs site home page. --------- Signed-off-by: Lawrence Lane <[email protected]> Signed-off-by: L.B. <[email protected]> Co-authored-by: jgerh <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Docs update --------- Signed-off-by: Anna Warno <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: AWarno <[email protected]> Co-authored-by: Oliver Koenig <[email protected]> Co-authored-by: Alexey Gronskiy <[email protected]> Co-authored-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: oliver könig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]> Signed-off-by: Anna Warno <[email protected]>

AWarno · 2025-10-01T12:15:00Z

/ok to test 06e6a85

AWarno requested review from a team and agronskiy as code owners September 29, 2025 11:45

ko3n1g and others added 24 commits September 29, 2025 13:56

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.4

1326d00

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.4

3a846a9

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

chore(ci/release): enable cron run (#216)

098f3fb

Signed-off-by: Anna Warno <[email protected]>

(feat) Configure request method for progress tracking requests (#213)

da80190

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.5

f3e36e7

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.5

7f77d93

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Update overview.md (#210)

6e81164

checkbox added Signed-off-by: AWarno <[email protected]> Signed-off-by: Anna Warno <[email protected]>

feat(multi-instance): haproxy

75aeeb6

Signed-off-by: Anna Warno <[email protected]>

fix(health-url): health url fixed

86a557d

Signed-off-by: Anna Warno <[email protected]>

fix(conflict): fix conflict

0681135

Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.4

710ca95

Signed-off-by: Oliver Koenig <[email protected]>

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.5

fc772a2

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.5

a5bee56

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

fix(executors): migrate to eval-factory cmd (#229)

4884615

It unblocks us to use new Eval Factory containers in the launcher — they don't have `nv-eval`/`nv_eval` alias anymore. Signed-off-by: Piotr Januszewski <[email protected]> Signed-off-by: Anna Warno <[email protected]>

(chore) Update container versions (#230)

d6cf567

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

(chore) Switch to referring to latest in the docs (#231)

de17a9e

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

(chore) Revert switch to latest (#232)

6aa2f8c

Signed-off-by: Wojciech Prazuch <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator to v0.1.6

4375a60

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

beep boop 🤖: Bumping nemo_evaluator_launcher to v0.1.6

a1fb564

Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

ci(fix): Dependabot (#236)

c78bb73

Signed-off-by: oliver könig <[email protected]> Signed-off-by: Anna Warno <[email protected]>

Add changes from PR 215 to the README (#239)

1452fce

Signed-off-by: Marta Stepniewska-Dziubinska <[email protected]> Signed-off-by: Anna Warno <[email protected]>

AWarno force-pushed the awarno/haproxy branch from 84c264e to 1452fce Compare September 29, 2025 12:55

Merge branch 'main' into awarno/haproxy

3e6c9d2

AWarno marked this pull request as draft September 29, 2025 13:19

Merge branch 'main' into awarno/haproxy

06e6a85

AWarno marked this pull request as ready for review October 1, 2025 12:12

AWarno marked this pull request as draft October 2, 2025 17:21

Merge branch 'main' into awarno/haproxy

aa87ded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Awarno/haproxy #241

Awarno/haproxy #241

Uh oh!

AWarno commented Sep 29, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

AWarno commented Oct 1, 2025

Uh oh!

Uh oh!

Awarno/haproxy #241

Are you sure you want to change the base?

Awarno/haproxy #241

Uh oh!

Conversation

AWarno commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Enabling Multi-Instance Deployment with HAProxy

Why HAProxy

Alternative Solutions

Literature

TODO

Next Steps

Uh oh!

copy-pr-bot bot commented Sep 29, 2025

Uh oh!

AWarno commented Oct 1, 2025

Uh oh!

Uh oh!

AWarno commented Sep 29, 2025 •

edited

Loading