[Core] Expose API endpoint `/is_sleeping` #14312

waltforme · 2025-03-05T21:13:40Z

This PR exposes a read-only API to check whether the engine is sleeping. More details are documented as #14311 .

Signed-off-by: Jun Duan <[email protected]>

github-actions · 2025-03-05T21:13:49Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Jun Duan <[email protected]>

youkaichao

the motivation sounds good to me, @njhill can you help take a look?

Signed-off-by: Jun Duan <[email protected]>

njhill · 2025-03-13T15:04:00Z

The changes themselves look fine to me I'm just unsure of how commonly needed this might be (same as @youkaichao's thought), especially if we ensure that the sleep/wakeup operations are idempotent (not sure if that's currently the case but should be trivial otherwise).

Today a sleeping engine crashes if a request is sent to it. This probe will give a good citizen peace of mind before sending a request.

Could we make a change to just fail the requests in this case rather than crashing the engine? That could then also serve as the probe mechanism if needed.

waltforme · 2025-03-13T17:28:15Z

Thanks for @njhill 's review! I absolutely agree the @njhill suggested 'fail request when sleeping' feature is good to do.

I think the probe currently implemented in the PR is necessary, even if the 'fail request when sleeping' feature is done.

We may think from a user's perspective. The user could be a person who can't remember the sleeping status for a fleet of vLLM instances, or a k8s controller that just crashed/restarted and trying to rebuild the global state. It sounds more natural to directly query an API endpoint, rather than sending an inference request to each of the vLLM instances, then observe whether each of the request fails or succeeds.

Moreover, if the inference-request-as-a-probe is sent to an awake engine, that request will be served and consumes extra resource. So IMHO, using an API endpoint is not only natural but also more efficient.

njhill · 2025-03-15T01:18:17Z

@waltforme actually could you add a test for this? Probably just adding something to https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_sleep.py should suffice.

Signed-off-by: Jun Duan <[email protected]>

waltforme · 2025-03-15T10:13:15Z

@waltforme actually could you add a test for this? Probably just adding something to https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_sleep.py should suffice.

@njhill Absolutely. Added into the suggested file. Thanks for checking this!

aarnphm · 2025-03-15T11:15:34Z

not sure if this is a standard elsewhere, but we can follow k8s health API endpoint for this fwiw. (i also responded in the ticket)

waltforme · 2025-03-16T06:06:44Z

not sure if this is a standard elsewhere, but we can follow k8s health API endpoint for this fwiw. (i also responded in the ticket)

@aarnphm Thanks for the point!
It looks to me, however, the k8s API health endpoints expose things that are very specific to k8s. For example, I tried one of them:

$ kubectl get --raw='/readyz/poststarthook/generic-apiserver-start-informers'
ok

Would you elaborate what we want to follow, for vLLM?

aarnphm · 2025-03-16T06:32:06Z

https://kubernetes.io/docs/reference/using-api/health-checks/#individual-health-checks

This is probably also related to production stack, but what I have in mind:

/readyz can be used to determine whether the engine is sleeping or not.
/livez can be used to determine where all workers are ready.

Signed-off-by: Jun Duan <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Jun Duan <[email protected]>

Signed-off-by: Jun Duan <[email protected]> Signed-off-by: Mu Huai <[email protected]>

waltforme added 2 commits March 5, 2025 06:13

Expose API endpoint /is_sleeping for V1

94cf12f

Signed-off-by: Jun Duan <[email protected]>

Expose API endpoint /is_sleeping for V0

248972c

Signed-off-by: Jun Duan <[email protected]>

waltforme requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat, youkaichao, ywang96 and zhuohan123 as code owners March 5, 2025 21:13

mergify bot added frontend v1 labels Mar 5, 2025

waltforme force-pushed the sleep-probe branch from 4c3c6b8 to 8b514cf Compare March 5, 2025 22:04

Try to fix incompatible types reported by CI

4c71475

Signed-off-by: Jun Duan <[email protected]>

waltforme force-pushed the sleep-probe branch from 8b514cf to 4c71475 Compare March 5, 2025 22:27

youkaichao marked this pull request as draft March 6, 2025 05:01

waltforme marked this pull request as ready for review March 6, 2025 13:00

youkaichao reviewed Mar 6, 2025

View reviewed changes

Add missing method for class AsyncLLMEngine

70ed085

Signed-off-by: Jun Duan <[email protected]>

waltforme force-pushed the sleep-probe branch from 07e3637 to 70ed085 Compare March 6, 2025 14:32

njhill approved these changes Mar 14, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 14, 2025

Add test for /is_sleeping API endpoint.

7a6252c

Signed-off-by: Jun Duan <[email protected]>

waltforme requested review from DarkLight1337 and simon-mo as code owners March 15, 2025 10:09

DarkLight1337 enabled auto-merge (squash) March 15, 2025 11:34

vllm-bot merged commit 74bc397 into vllm-project:main Mar 15, 2025
35 of 37 checks passed

waltforme deleted the sleep-probe branch March 16, 2025 06:09

waltforme mentioned this pull request Mar 20, 2025

[Feature]: Expose a read-only API to check whether engine is sleeping #14311

Closed

1 task

njhill mentioned this pull request Mar 27, 2025

[Bug]: vLLM engine crashes then restarts and loads the model on sleep if a chat request is made #15483

Open

1 task

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Core] Expose API endpoint /is_sleeping (vllm-project#14312)

d1a5731

Signed-off-by: Jun Duan <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Core] Expose API endpoint /is_sleeping (vllm-project#14312)

6eba30b

Signed-off-by: Jun Duan <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Core] Expose API endpoint /is_sleeping (vllm-project#14312)

4146c55

Signed-off-by: Jun Duan <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Expose API endpoint `/is_sleeping` #14312

[Core] Expose API endpoint `/is_sleeping` #14312

Uh oh!

waltforme commented Mar 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

youkaichao left a comment

Uh oh!

njhill commented Mar 13, 2025

Uh oh!

waltforme commented Mar 13, 2025

Uh oh!

njhill commented Mar 15, 2025

Uh oh!

waltforme commented Mar 15, 2025

Uh oh!

aarnphm commented Mar 15, 2025

Uh oh!

Uh oh!

waltforme commented Mar 16, 2025

Uh oh!

aarnphm commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Core] Expose API endpoint /is_sleeping #14312

[Core] Expose API endpoint /is_sleeping #14312

Uh oh!

Conversation

waltforme commented Mar 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Mar 13, 2025

Uh oh!

waltforme commented Mar 13, 2025

Uh oh!

njhill commented Mar 15, 2025

Uh oh!

waltforme commented Mar 15, 2025

Uh oh!

aarnphm commented Mar 15, 2025

Uh oh!

Uh oh!

waltforme commented Mar 16, 2025

Uh oh!

aarnphm commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Core] Expose API endpoint `/is_sleeping` #14312

[Core] Expose API endpoint `/is_sleeping` #14312

waltforme commented Mar 5, 2025 •

edited by github-actions bot

Loading