[do not merge] trigger test #497

raptorsun · 2025-09-03T12:37:57Z

do not merge

Summary by CodeRabbit

New Features
- Added token usage tracking for AI responses, enabling provider- and model-aware metrics without changing user-visible behavior.
- Metrics update occurs for both standard and streaming responses, with safe fallbacks to avoid disrupting replies.
Tests
- Expanded unit tests to cover token counting logic and to mock metrics in endpoint tests for stability.
Chores
- Improved observability by integrating token metrics into response handling while preserving current response formats and performance.

Signed-off-by: Haoyu Sun <[email protected]>

coderabbitai · 2025-09-03T12:38:05Z

Walkthrough

Adds token-counting instrumentation for LLM turns. Introduces update_llm_token_count_from_turn in metrics.utils. Wires token metrics into query and streaming endpoints. Extends retrieve_response with a provider_id parameter and forwards it. Updates tests to mock metrics and adds a unit test for the new utility.

Changes

Cohort / File(s)	Summary of Changes
Query endpoint metrics + signature `src/app/endpoints/query.py`	Import metrics helper; forward `provider_id` to response retrieval; extend `retrieve_response` signature with `*, provider_id: str = ""`; compute `model_label` and call `update_llm_token_count_from_turn` after LLM response; expand pylint disables.
Streaming endpoint metrics `src/app/endpoints/streaming_query.py`	Import metrics helper; on `turn_complete`, derive `system_prompt` and call `update_llm_token_count_from_turn`; wrap in try/except with log on failure; no other flow changes.
Metrics utility addition `src/metrics/utils.py`	Add `update_llm_token_count_from_turn(turn, model, provider, system_prompt="")`; use Tokenizer/ChatFormat to count tokens for output and input (with optional system prompt) and increment `llm_token_received_total`/`llm_token_sent_total`.
Query endpoint tests updated `tests/unit/app/endpoints/test_query.py`	Add `mock_metrics(mocker)` helper patching `app.endpoints.query.update_llm_token_count_from_turn`; invoke helper across tests to noop metrics.
Streaming endpoint tests updated `tests/unit/app/endpoints/test_streaming_query.py`	Add `mock_metrics(mocker)` patching `app.endpoints.streaming_query.update_llm_token_count_from_turn`; use in affected tests.
Metrics utility tests `tests/unit/metrics/test_utis.py`	Import new function; add `test_update_llm_token_count_from_turn` mocking Tokenizer/ChatFormat; assert sent/received token metrics increments with labels.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant API as QueryEndpoint
  participant Core as retrieve_response
  participant LLM as Provider
  participant Metrics as metrics.utils

  Client->>API: POST /query
  API->>Core: retrieve_response(..., provider_id)
  Core->>LLM: invoke(model_id, input)
  LLM-->>Core: assistant output
  Note over Core,Metrics: New: token counting for turn
  Core->>Metrics: update_llm_token_count_from_turn(turn, model_label, provider_id, system_prompt)
  Core-->>API: TurnSummary, conversation_id
  API-->>Client: 200 OK

sequenceDiagram
  autonumber
  actor Client
  participant API as StreamingEndpoint
  participant Stream as StreamLoop
  participant Metrics as metrics.utils

  Client->>API: POST /streaming_query (SSE)
  API->>Stream: start stream loop
  loop tokens/events
    Stream-->>Client: events...
  end
  Note over Stream,Metrics: New: on turn_complete
  Stream->>Metrics: update_llm_token_count_from_turn(turn, model_id, provider_id, system_prompt)
  Stream-->>Client: stream end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

LCORE-220: Add metrics to lightspeed-stack #256: Introduced LLM token sent/received metrics; this PR implements counting and wires updates into query/streaming paths.

Suggested reviewers

manstis
tisnik
umago

Poem

Little paws count tokens, hop by hop,
Bytes like carrots pile up in a crop.
Query streams, metrics gleam, tick-tock—
Models whisper, logs keep stock.
With every turn, I thump in cheer:
Sent and received, the numbers are clear. 🥕🐇

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

tests/unit/app/endpoints/test_streaming_query.py (1)
357-358: Reduce repetition by auto-mocking metrics for this module.
Consider a module-scoped autouse fixture so future tests don’t forget to patch metrics.

Apply:
+@pytest.fixture(autouse=True)
+def _mock_metrics_autouse(mocker):
+    mocker.patch(
+        "app.endpoints.streaming_query.update_llm_token_count_from_turn",
+        return_value=None,
+    )
Also applies to: 364-365
tests/unit/app/endpoints/test_query.py (1)
454-454: Avoid repetitive calls to mock_metrics across tests.
Use an autouse fixture to patch once per module and prevent omissions.

Apply:
@@
-def mock_metrics(mocker):
-    """Helper function to mock metrics operations for query endpoints."""
-    mocker.patch(
-        "app.endpoints.query.update_llm_token_count_from_turn",
-        return_value=None,
-    )
+@pytest.fixture(autouse=True)
+def _mock_metrics_autouse(mocker):
+    """Auto-mock metrics for all tests in this module."""
+    mocker.patch("app.endpoints.query.update_llm_token_count_from_turn", return_value=None)
Then remove individual mock_metrics(...) calls in tests.

Also applies to: 486-486, 519-519, 559-559, 609-609, 662-662, 716-716, 774-774, 829-829, 885-885, 955-955, 1016-1016, 1351-1351, 1402-1402
src/app/endpoints/streaming_query.py (1)
625-631: Token metrics on turn_complete: OK; minor nit on recomputing system_prompt.
You recompute system_prompt inside the loop; it’s constant per request. Compute once before the async for and close over it.

Example (outside the generator’s loop):
system_prompt = get_system_prompt(query_request, configuration)
async for chunk in turn_response:
    ...
tests/unit/metrics/test_utis.py (1)

1-1: Nit: filename typo.
Consider renaming test_utis.py → test_utils.py for clarity.
src/app/endpoints/query.py (1)
392-400: Make provider_id default safer and explicit.

Avoid empty-string labels in metrics; prefer a sentinel and validate early.

Apply:
-async def retrieve_response(  # pylint: disable=too-many-locals,too-many-branches,too-many-arguments
+async def retrieve_response(  # pylint: disable=too-many-locals,too-many-branches,too-many-arguments
     client: AsyncLlamaStackClient,
     model_id: str,
     query_request: QueryRequest,
     token: str,
     mcp_headers: dict[str, dict[str, str]] | None = None,
     *,
-    provider_id: str = "",
+    provider_id: str = "unknown",
 ) -> tuple[TurnSummary, str]:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f8da8e0 and 0c5e610.

📒 Files selected for processing (6)

src/app/endpoints/query.py (5 hunks)
src/app/endpoints/streaming_query.py (2 hunks)
src/metrics/utils.py (2 hunks)
tests/unit/app/endpoints/test_query.py (16 hunks)
tests/unit/app/endpoints/test_streaming_query.py (2 hunks)
tests/unit/metrics/test_utis.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

src/metrics/utils.py (2)

src/client.py (1)

AsyncLlamaStackClientHolder (18-50)

src/configuration.py (1)

configuration (60-64)

tests/unit/app/endpoints/test_streaming_query.py (1)

tests/unit/app/endpoints/test_query.py (1)

mock_metrics (49-54)

src/app/endpoints/streaming_query.py (3)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/utils/endpoints.py (1)

get_system_prompt (70-101)

src/configuration.py (1)

configuration (60-64)

tests/unit/app/endpoints/test_query.py (1)

tests/unit/app/endpoints/test_streaming_query.py (1)

mock_metrics (61-66)

tests/unit/metrics/test_utis.py (1)

src/metrics/utils.py (2)

setup_model_metrics (20-57)

update_llm_token_count_from_turn (60-77)

src/app/endpoints/query.py (3)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/app/endpoints/streaming_query.py (1)

retrieve_response (685-796)

src/models/requests.py (1)

QueryRequest (72-222)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

🔇 Additional comments (10)

tests/unit/app/endpoints/test_streaming_query.py (1)

61-67: Good isolation of metrics side effects in streaming tests.
Mock cleanly patches update_llm_token_count_from_turn, keeping tests deterministic.

tests/unit/app/endpoints/test_query.py (1)

49-55: Good helper to null metrics during unit tests.
Keeps tests focused on behavior without counting tokens.

src/app/endpoints/streaming_query.py (1)

29-29: Import of token-counting utility is appropriate.
Matches usage in the generator; import location is fine.
src/metrics/utils.py (2)
3-10: New imports align with the added functionality.
Scope and ordering look good.

60-77: I’ve run the checks and confirmed that turn.output_message and turn.input_messages are already RawMessage instances and that ChatFormat.encode_dialog_prompt accepts them directly—no runtime mismatch. The system prompt should indeed be role="system", not "user". Removing the proposed defensive normalization, simply update the system prompt’s role:
-    input_messages = [RawMessage(role="user", content=system_prompt)] + cast(
-        list[RawMessage], turn.input_messages
-    )
+    input_messages = []
+    if system_prompt:
+        input_messages.append(RawMessage(role="system", content=system_prompt))
+    input_messages += cast(list[RawMessage], turn.input_messages)
No provider‐specific mapping needed here.
tests/unit/metrics/test_utis.py (2)

3-3: Import coverage looks good.
You’re testing both setup and the new token-count function.

79-124: Solid unit test for token-count updates with clear expectations.
Mocks are sufficient and assertions target the right metric labels.

src/app/endpoints/query.py (3)

27-27: Import for token metrics is correct.

Import location and symbol name match usage below. No issues.

222-223: Good: provider_id forwarded explicitly.

Keyword-only usage avoids positional breakage and keeps call sites clear.

418-419: Docstring updated correctly.

Parameter docs align with signature and intent.

coderabbitai · 2025-09-03T12:45:44Z

src/app/endpoints/query.py

+    # Update token count metrics for the LLM call
+    model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id
+    update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt)
+


🛠️ Refactor suggestion

Harden metrics update to never break the endpoint; use last path segment.

Instrumentation must be best-effort. If Tokenizer/formatter or turn fields are missing, this can raise and 500 the request. Also, rsplit is safer for nested identifiers.

Apply:

- # Update token count metrics for the LLM call - model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id - update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) + # Update token count metrics for the LLM call (best-effort; do not disrupt response) + try: + model_label = model_id.rsplit("/", 1)[-1] + update_llm_token_count_from_turn( + response, model_label, provider_id or "unknown", system_prompt + ) + except Exception as err: # pylint: disable=broad-except + logger.warning("Failed to update token count metrics: %s", err)

If you prefer, we can centralize this “safe update” in a tiny helper to reuse here and in streaming_query.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Update token count metrics for the LLM call

model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id

update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt)

# Update token count metrics for the LLM call (best-effort; do not disrupt response)

try:

model_label = model_id.rsplit("/", 1)[-1]

update_llm_token_count_from_turn(

response, model_label, provider_id or "unknown", system_prompt

)

except Exception as err: # pylint: disable=broad-except

logger.warning("Failed to update token count metrics: %s", err)

🤖 Prompt for AI Agents

In src/app/endpoints/query.py around lines 518-521, the metrics update can raise if tokenizer/formatter or expected fields on the turn/response are missing and using split("/") can pick the wrong segment for nested identifiers; change to use rsplit("/", 1) to get the last path segment for model_label and wrap the call to update_llm_token_count_from_turn in a try/except that swallows exceptions (and logs a debug/warn) so instrumentation is best-effort and cannot 500 the endpoint; optionally factor this logic into a small helper (used here and in streaming_query) that extracts the safe model_label and invokes the update inside a non-throwing guard.

LCORE-411: add token usage metrics

0c5e610

Signed-off-by: Haoyu Sun <[email protected]>

raptorsun closed this Sep 3, 2025

coderabbitai bot reviewed Sep 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[do not merge] trigger test #497

[do not merge] trigger test #497

Uh oh!

raptorsun commented Sep 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 3, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    # Update token count metrics for the LLM call
-    model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id
-    update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt)
+    # Update token count metrics for the LLM call (best-effort; do not disrupt response)
+    try:
+        model_label = model_id.rsplit("/", 1)[-1]
+        update_llm_token_count_from_turn(
+            response, model_label, provider_id or "unknown", system_prompt
+        )
+    except Exception as err:  # pylint: disable=broad-except
+        logger.warning("Failed to update token count metrics: %s", err)

[do not merge] trigger test #497

[do not merge] trigger test #497

Uh oh!

Conversation

raptorsun commented Sep 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raptorsun commented Sep 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)