Skip to content

Conversation

@frank-wei
Copy link
Contributor

Summary:
Support passing some inference metrics through request_output. Therefore, these metrics can be passed down through llm engine step().
It is very useful when users need to leverage these metrics

Test Plan: NA

Reviewed By: 22quinn

Differential Revision: D83576815

Summary:
Support passing some inference metrics through request_output. Therefore, these metrics can be passed down through llm engine step().
It is very useful when users need to leverage these metrics

Test Plan: NA

Reviewed By: 22quinn

Differential Revision: D83576815
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for passing more detailed inference metrics via RequestOutput. This is achieved by adding new fields to the RequestMetrics dataclass and populating it in the OutputProcessor.

My review has identified a couple of issues:

  1. The new logic for populating RequestMetrics misses some available data from req_state.stats and doesn't handle unset timestamps correctly. I've provided a suggestion to fix this.
  2. Crucially, this new functionality is not covered by any tests. Adding tests is essential to ensure the feature works as expected and to prevent future breakages.

Please address these points to improve the quality and reliability of the changes.

Comment on lines +449 to +455
request_output.metrics = RequestMetrics(
arrival_time=req_state.stats.arrival_time,
last_token_time=req_state.stats.last_token_ts,
first_token_time=req_state.stats.first_token_ts,
num_generation_tokens=req_state.stats.num_generation_tokens,
first_token_latency=req_state.stats.first_token_latency,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This new logic for populating request_output.metrics is not covered by any tests, as indicated by 'Test Plan: NA'. New features, especially those involving data structures and logic like this, must be accompanied by tests to ensure correctness and prevent future regressions.

Please add unit tests to verify that the RequestMetrics object is populated correctly. A good place for this would be in tests/v1/engine/test_output_processor.py, for example by extending test_iteration_stats to check the contents of request_output.metrics.

Comment on lines +449 to +455
request_output.metrics = RequestMetrics(
arrival_time=req_state.stats.arrival_time,
last_token_time=req_state.stats.last_token_ts,
first_token_time=req_state.stats.first_token_ts,
num_generation_tokens=req_state.stats.num_generation_tokens,
first_token_latency=req_state.stats.first_token_latency,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new RequestMetrics object is missing some available metrics. req_state.stats contains scheduled_ts and queued_ts, which can be used to populate first_scheduled_time and time_in_queue. Also, timestamp fields like first_token_ts can be 0.0 if not set, which should probably be converted to None for consistency.

Consider populating more fields and handling the 0.0 case for timestamps.

                request_output.metrics = RequestMetrics(
                        arrival_time=req_state.stats.arrival_time,
                        last_token_time=req_state.stats.last_token_ts,
                        first_scheduled_time=req_state.stats.scheduled_ts if req_state.stats.scheduled_ts > 0 else None,
                        first_token_time=req_state.stats.first_token_ts if req_state.stats.first_token_ts > 0 else None,
                        time_in_queue=(req_state.stats.scheduled_ts - req_state.stats.queued_ts) if req_state.stats.scheduled_ts > 0 and req_state.stats.queued_ts > 0 else None,
                        num_generation_tokens=req_state.stats.num_generation_tokens,
                        first_token_latency=req_state.stats.first_token_latency)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 446 to +454
if request_output := req_state.make_request_output(
new_token_ids, pooling_output, finish_reason, stop_reason,
kv_transfer_params):
request_output.metrics = RequestMetrics(
arrival_time=req_state.stats.arrival_time,
last_token_time=req_state.stats.last_token_ts,
first_token_time=req_state.stats.first_token_ts,
num_generation_tokens=req_state.stats.num_generation_tokens,
first_token_latency=req_state.stats.first_token_latency,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Guard RequestMetrics creation when stats logging is disabled

The new metrics assignment assumes req_state.stats is always populated, but RequestState.__init__ sets self.stats to None whenever log_stats=False (e.g. the default for vllm.LLM() in entrypoints/llm.py). In that configuration, request_output.metrics = RequestMetrics(arrival_time=req_state.stats.arrival_time, …) will raise an AttributeError on the first token because req_state.stats is None. Previously the code only touched self.stats inside logging code paths guarded by the same flag. Consider skipping this assignment when req_state.stats is None or synthesizing defaults to avoid breaking non‑logging runs.

Useful? React with 👍 / 👎.

@markmc
Copy link
Member

markmc commented Oct 6, 2025

See also #24947 - at a glance, it is doing a similar thing as this PR

@frank-wei
Copy link
Contributor Author

See also #24947 - at a glance, it is doing a similar thing as this PR

thanks @markmc for the pointer. I will close this PR.

@frank-wei frank-wei closed this Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants