-
Notifications
You must be signed in to change notification settings - Fork 45
[do not merge] trigger test #497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Haoyu Sun <[email protected]>
WalkthroughAdds token-counting instrumentation for LLM turns. Introduces update_llm_token_count_from_turn in metrics.utils. Wires token metrics into query and streaming endpoints. Extends retrieve_response with a provider_id parameter and forwards it. Updates tests to mock metrics and adds a unit test for the new utility. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant API as QueryEndpoint
participant Core as retrieve_response
participant LLM as Provider
participant Metrics as metrics.utils
Client->>API: POST /query
API->>Core: retrieve_response(..., provider_id)
Core->>LLM: invoke(model_id, input)
LLM-->>Core: assistant output
Note over Core,Metrics: New: token counting for turn
Core->>Metrics: update_llm_token_count_from_turn(turn, model_label, provider_id, system_prompt)
Core-->>API: TurnSummary, conversation_id
API-->>Client: 200 OK
sequenceDiagram
autonumber
actor Client
participant API as StreamingEndpoint
participant Stream as StreamLoop
participant Metrics as metrics.utils
Client->>API: POST /streaming_query (SSE)
API->>Stream: start stream loop
loop tokens/events
Stream-->>Client: events...
end
Note over Stream,Metrics: New: on turn_complete
Stream->>Metrics: update_llm_token_count_from_turn(turn, model_id, provider_id, system_prompt)
Stream-->>Client: stream end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tests/unit/app/endpoints/test_streaming_query.py (1)
357-358
: Reduce repetition by auto-mocking metrics for this module.
Consider a module-scoped autouse fixture so future tests don’t forget to patch metrics.Apply:
+@pytest.fixture(autouse=True) +def _mock_metrics_autouse(mocker): + mocker.patch( + "app.endpoints.streaming_query.update_llm_token_count_from_turn", + return_value=None, + )Also applies to: 364-365
tests/unit/app/endpoints/test_query.py (1)
454-454
: Avoid repetitive calls to mock_metrics across tests.
Use an autouse fixture to patch once per module and prevent omissions.Apply:
@@ -def mock_metrics(mocker): - """Helper function to mock metrics operations for query endpoints.""" - mocker.patch( - "app.endpoints.query.update_llm_token_count_from_turn", - return_value=None, - ) +@pytest.fixture(autouse=True) +def _mock_metrics_autouse(mocker): + """Auto-mock metrics for all tests in this module.""" + mocker.patch("app.endpoints.query.update_llm_token_count_from_turn", return_value=None)Then remove individual mock_metrics(...) calls in tests.
Also applies to: 486-486, 519-519, 559-559, 609-609, 662-662, 716-716, 774-774, 829-829, 885-885, 955-955, 1016-1016, 1351-1351, 1402-1402
src/app/endpoints/streaming_query.py (1)
625-631
: Token metrics on turn_complete: OK; minor nit on recomputing system_prompt.
You recompute system_prompt inside the loop; it’s constant per request. Compute once before the async for and close over it.Example (outside the generator’s loop):
system_prompt = get_system_prompt(query_request, configuration) async for chunk in turn_response: ...tests/unit/metrics/test_utis.py (1)
1-1
: Nit: filename typo.
Consider renaming test_utis.py → test_utils.py for clarity.src/app/endpoints/query.py (1)
392-400
: Make provider_id default safer and explicit.Avoid empty-string labels in metrics; prefer a sentinel and validate early.
Apply:
-async def retrieve_response( # pylint: disable=too-many-locals,too-many-branches,too-many-arguments +async def retrieve_response( # pylint: disable=too-many-locals,too-many-branches,too-many-arguments client: AsyncLlamaStackClient, model_id: str, query_request: QueryRequest, token: str, mcp_headers: dict[str, dict[str, str]] | None = None, *, - provider_id: str = "", + provider_id: str = "unknown", ) -> tuple[TurnSummary, str]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
src/app/endpoints/query.py
(5 hunks)src/app/endpoints/streaming_query.py
(2 hunks)src/metrics/utils.py
(2 hunks)tests/unit/app/endpoints/test_query.py
(16 hunks)tests/unit/app/endpoints/test_streaming_query.py
(2 hunks)tests/unit/metrics/test_utis.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/metrics/utils.py (2)
src/client.py (1)
AsyncLlamaStackClientHolder
(18-50)src/configuration.py (1)
configuration
(60-64)
tests/unit/app/endpoints/test_streaming_query.py (1)
tests/unit/app/endpoints/test_query.py (1)
mock_metrics
(49-54)
src/app/endpoints/streaming_query.py (3)
src/metrics/utils.py (1)
update_llm_token_count_from_turn
(60-77)src/utils/endpoints.py (1)
get_system_prompt
(70-101)src/configuration.py (1)
configuration
(60-64)
tests/unit/app/endpoints/test_query.py (1)
tests/unit/app/endpoints/test_streaming_query.py (1)
mock_metrics
(61-66)
tests/unit/metrics/test_utis.py (1)
src/metrics/utils.py (2)
setup_model_metrics
(20-57)update_llm_token_count_from_turn
(60-77)
src/app/endpoints/query.py (3)
src/metrics/utils.py (1)
update_llm_token_count_from_turn
(60-77)src/app/endpoints/streaming_query.py (1)
retrieve_response
(685-796)src/models/requests.py (1)
QueryRequest
(72-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (10)
tests/unit/app/endpoints/test_streaming_query.py (1)
61-67
: Good isolation of metrics side effects in streaming tests.
Mock cleanly patches update_llm_token_count_from_turn, keeping tests deterministic.tests/unit/app/endpoints/test_query.py (1)
49-55
: Good helper to null metrics during unit tests.
Keeps tests focused on behavior without counting tokens.src/app/endpoints/streaming_query.py (1)
29-29
: Import of token-counting utility is appropriate.
Matches usage in the generator; import location is fine.src/metrics/utils.py (2)
3-10
: New imports align with the added functionality.
Scope and ordering look good.
60-77
: I’ve run the checks and confirmed thatturn.output_message
andturn.input_messages
are alreadyRawMessage
instances and thatChatFormat.encode_dialog_prompt
accepts them directly—no runtime mismatch. The system prompt should indeed berole="system"
, not"user"
. Removing the proposed defensive normalization, simply update the system prompt’s role:- input_messages = [RawMessage(role="user", content=system_prompt)] + cast( - list[RawMessage], turn.input_messages - ) + input_messages = [] + if system_prompt: + input_messages.append(RawMessage(role="system", content=system_prompt)) + input_messages += cast(list[RawMessage], turn.input_messages)No provider‐specific mapping needed here.
tests/unit/metrics/test_utis.py (2)
3-3
: Import coverage looks good.
You’re testing both setup and the new token-count function.
79-124
: Solid unit test for token-count updates with clear expectations.
Mocks are sufficient and assertions target the right metric labels.src/app/endpoints/query.py (3)
27-27
: Import for token metrics is correct.Import location and symbol name match usage below. No issues.
222-223
: Good: provider_id forwarded explicitly.Keyword-only usage avoids positional breakage and keeps call sites clear.
418-419
: Docstring updated correctly.Parameter docs align with signature and intent.
# Update token count metrics for the LLM call | ||
model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id | ||
update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden metrics update to never break the endpoint; use last path segment.
Instrumentation must be best-effort. If Tokenizer/formatter or turn fields are missing, this can raise and 500 the request. Also, rsplit is safer for nested identifiers.
Apply:
- # Update token count metrics for the LLM call
- model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id
- update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt)
+ # Update token count metrics for the LLM call (best-effort; do not disrupt response)
+ try:
+ model_label = model_id.rsplit("/", 1)[-1]
+ update_llm_token_count_from_turn(
+ response, model_label, provider_id or "unknown", system_prompt
+ )
+ except Exception as err: # pylint: disable=broad-except
+ logger.warning("Failed to update token count metrics: %s", err)
If you prefer, we can centralize this “safe update” in a tiny helper to reuse here and in streaming_query.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Update token count metrics for the LLM call | |
model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id | |
update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) | |
# Update token count metrics for the LLM call (best-effort; do not disrupt response) | |
try: | |
model_label = model_id.rsplit("/", 1)[-1] | |
update_llm_token_count_from_turn( | |
response, model_label, provider_id or "unknown", system_prompt | |
) | |
except Exception as err: # pylint: disable=broad-except | |
logger.warning("Failed to update token count metrics: %s", err) |
🤖 Prompt for AI Agents
In src/app/endpoints/query.py around lines 518-521, the metrics update can raise
if tokenizer/formatter or expected fields on the turn/response are missing and
using split("/") can pick the wrong segment for nested identifiers; change to
use rsplit("/", 1) to get the last path segment for model_label and wrap the
call to update_llm_token_count_from_turn in a try/except that swallows
exceptions (and logs a debug/warn) so instrumentation is best-effort and cannot
500 the endpoint; optionally factor this logic into a small helper (used here
and in streaming_query) that extracts the safe model_label and invokes the
update inside a non-throwing guard.
do not merge
Summary by CodeRabbit
New Features
Tests
Chores