[Core] Restructure core loop for async input preparation #23391

WoosukKwon · 2025-08-22T01:10:19Z

This PR restructures the core loop for both synchronous and asynchronous scheduling.

For sync scheduling, this allows overlapping bitmask construction with the model execution.
For async scheduling, this allows overlapping input preparation (and bitmask) with the model execution. Currently, only deserialization of scheduler outputs is overlapped. This could be expanded in future PRs.
Besides, this PR refactors the async scheduling loop to be easier to understand, and enables structured outputs support.

Signed-off-by: Woosuk Kwon <[email protected]>

gemini-code-assist

Code Review

This pull request restructures the core scheduling loop to better support asynchronous operations, which is a significant and positive change. My review focuses on ensuring that this refactoring doesn't introduce regressions. I've identified a few issues: a critical bug where the RayDistributedExecutor is not updated to the new interface, and a couple of high-severity issues related to missing error handling and a missing import. Addressing these will help ensure the stability and correctness of the new implementation.

gemini-code-assist · 2025-08-22T01:11:43Z

vllm/v1/executor/abstract.py

+    def prepare_inputs(self, scheduler_output) -> None:
+        self.collective_rpc("prepare_inputs", args=(scheduler_output, ))
+
+    def execute_model(self) -> None:
+        self.collective_rpc("execute_model")
+
+    def sample(self, grammar_bitmask) -> ModelRunnerOutput:
+        output = self.collective_rpc("sample", args=(grammar_bitmask, ))
        return output[0]


The RayDistributedExecutor in vllm/v1/executor/ray_distributed_executor.py has not been updated to implement the new abstract methods prepare_inputs, execute_model, and sample. It still implements the old execute_model method. This will lead to a TypeError at runtime when using the Ray executor, as it won't be a valid subclass of the Executor abstract base class. Please update RayDistributedExecutor to align with this new interface.

gemini-code-assist · 2025-08-22T01:11:43Z

vllm/v1/engine/core.py

+        self.model_executor.prepare_inputs(scheduler_output)
+        self.model_executor.execute_model()
+        bitmask = self.scheduler.get_grammar_bitmask(scheduler_output)
+        model_output = self.model_executor.sample(bitmask)
        engine_core_outputs = self.scheduler.update_from_output(
-            scheduler_output, model_output)  # type: ignore
-
+            scheduler_output, model_output)


The call to execute_model_with_error_logging has been removed in this refactoring of the step method, and it's also missing from the new step_async method. This is a potential regression as model execution errors will no longer be caught and logged with detailed context, which can make debugging difficult. It's recommended to reintroduce the error handling in both methods. For step, you could wrap the new multi-step execution logic in a helper function and pass it to execute_model_with_error_logging.

Suggested change

self.model_executor.prepare_inputs(scheduler_output)

self.model_executor.execute_model()

bitmask = self.scheduler.get_grammar_bitmask(scheduler_output)

model_output = self.model_executor.sample(bitmask)

engine_core_outputs = self.scheduler.update_from_output(

scheduler_output, model_output) # type: ignore

scheduler_output, model_output)

def _model_step(scheduler_output: "SchedulerOutput"):

self.model_executor.prepare_inputs(scheduler_output)

self.model_executor.execute_model()

bitmask = self.scheduler.get_grammar_bitmask(scheduler_output)

return self.model_executor.sample(bitmask)

model_output = self.execute_model_with_error_logging(

_model_step, scheduler_output)

engine_core_outputs = self.scheduler.update_from_output(

scheduler_output, model_output)

vllm/v1/executor/multiproc_executor.py

Signed-off-by: Woosuk Kwon <[email protected]>

yinghai · 2025-08-22T21:00:59Z

vllm/v1/executor/abstract.py

    ) -> Union[ModelRunnerOutput, Future[ModelRunnerOutput]]:
-        output = self.collective_rpc("execute_model",
-                                     args=(scheduler_output, ))
+        del non_block


what's this for?

mergify · 2025-08-25T07:42:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @WoosukKwon.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Ronald1995 · 2025-08-28T12:24:49Z

I don't understand how you implement the overlap of prepare_input , it seems like the prepare_input, execute_model, sample task are executed in order in worker process.

#23811 my solution is to use two threads in worker process, and overlap the prepare_input with the d2h copy operations! would you please explain why your solution will overlap the prepare_input. thanks!

WoosukKwon added 2 commits August 21, 2025 18:02

[Core] Restructure core loop for async input preparation

d2ecee4

Signed-off-by: Woosuk Kwon <[email protected]>

fix

d37831b

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon requested review from alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 22, 2025 01:10

mergify bot added the v1 label Aug 22, 2025

gemini-code-assist bot reviewed Aug 22, 2025

View reviewed changes

WoosukKwon added 2 commits August 21, 2025 18:12

fix

cf874e9

Signed-off-by: Woosuk Kwon <[email protected]>

fix

6d27081

Signed-off-by: Woosuk Kwon <[email protected]>

WoosukKwon mentioned this pull request Aug 22, 2025

[RFC]: Restructure the core loop to allow more asynchrony #23233

Open

1 task

fix

d22d5bf

Signed-off-by: Woosuk Kwon <[email protected]>

yinghai reviewed Aug 22, 2025

View reviewed changes

mergify bot added the needs-rebase label Aug 25, 2025

WoosukKwon marked this pull request as draft August 25, 2025 18:24

WoosukKwon closed this Sep 15, 2025

WoosukKwon deleted the woosuk/async-input-prep branch September 23, 2025 16:08

njhill mentioned this pull request Oct 15, 2025

[Core] Async scheduling + structured outputs compatibility #26866

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Restructure core loop for async input preparation #23391

[Core] Restructure core loop for async input preparation #23391

Uh oh!

WoosukKwon commented Aug 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 22, 2025

Uh oh!

gemini-code-assist bot Aug 22, 2025

Uh oh!

Uh oh!

yinghai Aug 22, 2025

Uh oh!

mergify bot commented Aug 25, 2025

Uh oh!

Ronald1995 commented Aug 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Core] Restructure core loop for async input preparation #23391

[Core] Restructure core loop for async input preparation #23391

Uh oh!

Conversation

WoosukKwon commented Aug 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yinghai Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Aug 25, 2025

Uh oh!

Ronald1995 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WoosukKwon commented Aug 22, 2025 •

edited by github-actions bot

Loading

Ronald1995 commented Aug 28, 2025 •

edited

Loading