Pass down metrics regarding RequestMetrics #26210

frank-wei · 2025-10-04T05:42:52Z

Summary:
Support passing some inference metrics through request_output. Therefore, these metrics can be passed down through llm engine step().
It is very useful when users need to leverage these metrics

Test Plan: NA

Reviewed By: 22quinn

Differential Revision: D83576815

Summary: Support passing some inference metrics through request_output. Therefore, these metrics can be passed down through llm engine step(). It is very useful when users need to leverage these metrics Test Plan: NA Reviewed By: 22quinn Differential Revision: D83576815

gemini-code-assist

Code Review

This pull request adds support for passing more detailed inference metrics via RequestOutput. This is achieved by adding new fields to the RequestMetrics dataclass and populating it in the OutputProcessor.

My review has identified a couple of issues:

The new logic for populating RequestMetrics misses some available data from req_state.stats and doesn't handle unset timestamps correctly. I've provided a suggestion to fix this.
Crucially, this new functionality is not covered by any tests. Adding tests is essential to ensure the feature works as expected and to prevent future breakages.

Please address these points to improve the quality and reliability of the changes.

gemini-code-assist · 2025-10-04T05:44:28Z

vllm/v1/engine/output_processor.py

+                request_output.metrics = RequestMetrics(
+                        arrival_time=req_state.stats.arrival_time,
+                        last_token_time=req_state.stats.last_token_ts,
+                        first_token_time=req_state.stats.first_token_ts,
+                        num_generation_tokens=req_state.stats.num_generation_tokens,
+                        first_token_latency=req_state.stats.first_token_latency,
+                    )


This new logic for populating request_output.metrics is not covered by any tests, as indicated by 'Test Plan: NA'. New features, especially those involving data structures and logic like this, must be accompanied by tests to ensure correctness and prevent future regressions.

Please add unit tests to verify that the RequestMetrics object is populated correctly. A good place for this would be in tests/v1/engine/test_output_processor.py, for example by extending test_iteration_stats to check the contents of request_output.metrics.

gemini-code-assist · 2025-10-04T05:44:28Z

vllm/v1/engine/output_processor.py

+                request_output.metrics = RequestMetrics(
+                        arrival_time=req_state.stats.arrival_time,
+                        last_token_time=req_state.stats.last_token_ts,
+                        first_token_time=req_state.stats.first_token_ts,
+                        num_generation_tokens=req_state.stats.num_generation_tokens,
+                        first_token_latency=req_state.stats.first_token_latency,
+                    )


The new RequestMetrics object is missing some available metrics. req_state.stats contains scheduled_ts and queued_ts, which can be used to populate first_scheduled_time and time_in_queue. Also, timestamp fields like first_token_ts can be 0.0 if not set, which should probably be converted to None for consistency.

Consider populating more fields and handling the 0.0 case for timestamps.

request_output.metrics = RequestMetrics( arrival_time=req_state.stats.arrival_time, last_token_time=req_state.stats.last_token_ts, first_scheduled_time=req_state.stats.scheduled_ts if req_state.stats.scheduled_ts > 0 else None, first_token_time=req_state.stats.first_token_ts if req_state.stats.first_token_ts > 0 else None, time_in_queue=(req_state.stats.scheduled_ts - req_state.stats.queued_ts) if req_state.stats.scheduled_ts > 0 and req_state.stats.queued_ts > 0 else None, num_generation_tokens=req_state.stats.num_generation_tokens, first_token_latency=req_state.stats.first_token_latency)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-04T05:44:59Z

vllm/v1/engine/output_processor.py

            if request_output := req_state.make_request_output(
                    new_token_ids, pooling_output, finish_reason, stop_reason,
                    kv_transfer_params):
+                request_output.metrics = RequestMetrics(
+                        arrival_time=req_state.stats.arrival_time,
+                        last_token_time=req_state.stats.last_token_ts,
+                        first_token_time=req_state.stats.first_token_ts,
+                        num_generation_tokens=req_state.stats.num_generation_tokens,
+                        first_token_latency=req_state.stats.first_token_latency,


Guard RequestMetrics creation when stats logging is disabled

The new metrics assignment assumes req_state.stats is always populated, but RequestState.__init__ sets self.stats to None whenever log_stats=False (e.g. the default for vllm.LLM() in entrypoints/llm.py). In that configuration, request_output.metrics = RequestMetrics(arrival_time=req_state.stats.arrival_time, …) will raise an AttributeError on the first token because req_state.stats is None. Previously the code only touched self.stats inside logging code paths guarded by the same flag. Consider skipping this assignment when req_state.stats is None or synthesizing defaults to avoid breaking non‑logging runs.

Useful? React with 👍 / 👎.

markmc · 2025-10-06T13:15:28Z

See also #24947 - at a glance, it is doing a similar thing as this PR

frank-wei · 2025-10-06T21:08:17Z

See also #24947 - at a glance, it is doing a similar thing as this PR

thanks @markmc for the pointer. I will close this PR.

frank-wei requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners October 4, 2025 05:42

mergify bot added the v1 label Oct 4, 2025

gemini-code-assist bot reviewed Oct 4, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 4, 2025

View reviewed changes

maxdebayser mentioned this pull request Oct 6, 2025

[RFC]: Per-request metrics for the offline API. #26298

Open

1 task

frank-wei closed this Oct 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pass down metrics regarding RequestMetrics #26210

Pass down metrics regarding RequestMetrics #26210

Uh oh!

frank-wei commented Oct 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 4, 2025

Uh oh!

gemini-code-assist bot Oct 4, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 4, 2025

Uh oh!

markmc commented Oct 6, 2025

Uh oh!

frank-wei commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Pass down metrics regarding RequestMetrics #26210

Pass down metrics regarding RequestMetrics #26210

Uh oh!

Conversation

frank-wei commented Oct 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

markmc commented Oct 6, 2025

Uh oh!

frank-wei commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants