Responses v2 support #540

luis5tb · 2025-09-15T15:34:27Z

Description

Add v2 endpoint to interact with LlamaStack using Responses API instead of Agent API

Type of change

Related Tickets & Documents

Related Issue # fix!: Enhance response API support to not fail with tool calling llamastack/llama-stack#3385
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

You need changes at LlamaStack (llamastack/llama-stack#3385) to be able to test this. I was using assisted-chat to test the end-to-end solution. I had to create new scripts to leverage the new v2 (make query_v2 or make streaming_query_v2)

Summary by CodeRabbit

New Features
- Added V2 Query APIs: POST /v2/query and /v2/streaming_query (SSE). Includes RBAC, model/provider selection, RAG/MCP tool integration, attachment validation, conversation persistence, metrics, and robust error handling.
Improvements
- Updated routing to clearly separate V1/V2/V3, exposing new V2 endpoints and adding a V3 conversations router.
Chores
- Bumped maximum supported Llama Stack version to 0.2.22.
Tests
- Added comprehensive unit tests for V2 query and streaming endpoints and updated router tests to cover new versions.

coderabbitai · 2025-09-15T15:34:36Z

Walkthrough

Adds v2 query and streaming endpoints using the Responses API, wires them under /v2 in routers, updates supported Llama Stack version, enhances SUID validation to accept Response IDs, and introduces comprehensive unit tests for new endpoints and routing.

Changes

Cohort / File(s)	Summary
V2 Query Endpoint `src/app/endpoints/query_v2.py`	New POST /v2/query endpoint with RBAC, model/provider resolution, Response API call, RAG/MCP tool assembly, transcript persistence, metrics, and error handling. Exposes router and helper functions (`retrieve_response`, `get_rag_tools`, `get_mcp_tools`).
V2 Streaming Endpoint `src/app/endpoints/streaming_query_v2.py`	New streaming /v2/streaming_query endpoint emitting SSE events. Builds tools, streams tokens/tool calls, persists conversations, updates metrics, and handles connection errors.
Routers Wiring `src/app/routers.py`	Imports and includes v2 routers (`query_v2`, `streaming_query_v2`) under /v2; retains v1; adds v3 conversations router inclusion.
Constants `src/constants.py`	Bumps MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION from "0.2.20" to "0.2.22".
Utilities `src/utils/suid.py`	`check_suid` now accepts Response IDs (`resp-`/`resp_`), normalizes and validates as UUIDs; input guard for non-strings/empty. Signature unchanged.
Unit Tests — Query V2 `tests/unit/app/endpoints/test_query_v2.py`	Tests RAG/MCP tool builders, retrieve_response behavior (no-tools, combined tools, parsing output/tool calls, attachment validation), endpoint success and APIConnectionError paths with metrics.
Unit Tests — Streaming Query V2 `tests/unit/app/endpoints/test_streaming_query_v2.py`	Tests retrieve_response tool building, no-tools path, streaming flow event sequence, metrics, conversation persistence, and APIConnectionError handling.
Unit Tests — Routers `tests/unit/app/test_routers.py`	Verifies presence and /v2 prefixes for new routers (`query_v2`, `streaming_query_v2`, `conversations_v2`), and updated router count.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant API as API (/v2/query)
  participant Auth as RBAC/Auth
  participant Selector as Model/Provider Selector
  participant Tools as Tool Builder (RAG/MCP)
  participant Llama as Responses API Client
  participant Store as Transcripts/Conversations
  Client->>API: POST /v2/query (QueryRequest)
  API->>Auth: Validate + authorize
  Auth-->>API: Auth tuple
  API->>Selector: Resolve model/provider
  Selector-->>API: model_id, provider_id
  API->>Tools: Build tools (RAG, MCP) + validate attachments
  Tools-->>API: tools[]
  API->>Llama: responses.create(...)
  Llama-->>API: response (text, tool_calls, response_id)
  API->>Store: Persist transcript/conversation
  Store-->>API: ack
  API-->>Client: 200 QueryResponse (conversation_id, text)
  note over API,Llama: On APIConnectionError → 500 and failure metric

sequenceDiagram
  autonumber
  participant Client
  participant API as API (/v2/streaming_query)
  participant Auth as RBAC/Auth
  participant Selector as Model/Provider Selector
  participant Tools as Tool Builder (RAG/MCP)
  participant Llama as Responses API Stream
  participant Store as Conversations/Transcripts
  Client->>API: POST /v2/streaming_query
  API->>Auth: Validate + authorize
  Auth-->>API: Auth tuple
  API->>Selector: Resolve model/provider
  Selector-->>API: model_id
  API->>Tools: Build tools / validate
  Tools-->>API: tools[]
  API->>Llama: responses.create(stream=True)
  Llama-->>API: SSE chunks (start, tokens, tool_calls, end)
  API-->>Client: SSE stream
  API->>Store: Persist conversation on first id
  note over API,Llama: On connection error → 500 JSON error

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Add tool calls to stored transcripts #404 — Shares TurnSummary changes, retrieve_response returning TurnSummary+conversation_id, and transcript handling via utils.
LCORE-220: Add metrics to lightspeed-stack #256 — Related metrics instrumentation used by new v2 endpoints and /metrics integration.
Convert lightspeed-core to async architecture #348 — Moves endpoints to async client patterns; aligns with AsyncLlamaStackClient usage here.

Suggested reviewers

manstis
jrobertboos
umago

Poem

A bunny taps the v2 drum—thump-thump, hooray!
Tools hop in: RAG and MCP join the play.
Streams ripple tokens, a conversational brook,
IDs validated with a whisker’s look.
Versions nudge forward, carrots aligned—
Ship the routes, metrics chimed, all nicely defined! 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “Responses v2 support” succinctly captures the central feature of the pull request, which is adding support for the version 2 Responses API endpoints. It is concise, directly relevant to the main changes in the diff, and avoids extraneous details or ambiguous phrasing. A teammate reviewing the history would immediately understand that this PR introduces v2 of the Responses API.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/app/endpoints/streaming_query.py (1)

212-212: SSE response is missing correct content type and streaming headers

Same issue as v2; events may be buffered or ignored.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

src/app/endpoints/conversations.py (1)

88-99: Access session_id robustly; returned items may be models, not dicts

Using .get() on a Pydantic/typed object will fail.

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail={"response": "Invalid session object", "cause": "session_id missing"},
+            )
+        session_id = str(session_id)

🧹 Nitpick comments (17)

src/utils/query.py (4)
309-310: Remove unused variable ‘token’.

Minor cleanup.
-    user_id, _, _, token = auth
+    user_id, _, _, _ = auth
338-350: Return 503 to match the documented OpenAPI response for connection errors.

Doc block maps connection failures to 503; handler currently uses 500.
-    raise HTTPException(
-        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+    raise HTTPException(
+        status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
         detail={
             "response": "Unable to connect to Llama Stack",
             "cause": str(e),
         },
     ) from e
33-52: Use 401 for Unauthorized in OpenAPI mapping.

400 is misleading for auth failures; map UnauthorizedResponse under 401.
-    400: {
-        "description": "Missing or invalid credentials provided by client",
-        "model": UnauthorizedResponse,
-    },
+    401: {
+        "description": "Missing or invalid credentials provided by client",
+        "model": UnauthorizedResponse,
+    },
258-288: Consider normalizing attachment metadata (lowercasing) before validation.

Prevents case-related rejects from clients.
-    for attachment in attachments:
+    for attachment in attachments:
+        # normalize without mutating original pydantic model
+        atype = attachment.attachment_type.lower()
+        ctype = attachment.content_type.lower()
-        if attachment.attachment_type not in constants.ATTACHMENT_TYPES:
+        if atype not in constants.ATTACHMENT_TYPES:
@@
-        if attachment.content_type not in constants.ATTACHMENT_CONTENT_TYPES:
+        if ctype not in constants.ATTACHMENT_CONTENT_TYPES:
src/app/endpoints/streaming_query_v2.py (2)
96-99: Pass the validated conversation context to model selection hints

Currently always passing None; prefer the actual user_conversation.
-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),
108-109: Avoid metadata_map shadowing

Outer variable is unused and shadowed by inner definition.
-        metadata_map: dict[str, dict[str, Any]] = {}
+        # metadata captured within response_generator
Also applies to: 128-129
src/app/endpoints/streaming_query.py (1)
5-6: Remove unused regex now that metadata parsing lives in utils.streaming_query

Dead code after centralization.
-import re
-...
-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
+# metadata extraction handled by utils.streaming_query
Also applies to: 59-59
src/utils/streaming_query.py (2)
164-172: Guard error payload shape

chunk.error may not always be a dict with "message".
-            "token": chunk.error["message"],
+            "token": getattr(chunk.error, "message", None)
+            or (chunk.error.get("message") if isinstance(chunk.error, dict) else str(chunk.error)),
80-100: End-event placeholders: consider carrying token counts when available

If tokens are tracked upstream, thread them through metadata_map or function args.

Happy to wire this if you expose counts from metrics/turn payloads.
src/app/endpoints/query_v2.py (2)
100-102: Prune unused params/imports and dead code

provider_id and mcp_headers aren’t used in retrieve_response; imported token-count util also unused; stray blank lines.
-        summary, conversation_id = await retrieve_response(
+        summary, conversation_id = await retrieve_response(
             client,
             llama_stack_model_id,
             query_request,
             token,
-            mcp_headers=mcp_headers,
-            provider_id=provider_id,
+            mcp_headers=mcp_headers,
         )
-    mcp_headers: dict[str, dict[str, str]] | None = None,
-    provider_id: str = "",
+    mcp_headers: dict[str, dict[str, str]] | None = None,
-    model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id
-    #update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt)
+    # Token accounting TBD for Responses API
-from metrics.utils import update_llm_token_count_from_turn
Also collapse extra blank import lines around utils.query.

Also applies to: 140-141, 264-266, 20-21, 31-39

220-233: Guard when response.output is None

Avoid iterating over None.
-    for idx, output_item in enumerate(response.output):
+    for idx, output_item in enumerate(response.output or []):
src/app/endpoints/conversations_v2.py (1)

58-79: Docs/status coverage: add 400/403 error responses.

This handler can raise 400 (invalid ID) and 403 (access denied) via validate_conversation_id/access, but conversation_responses doesn’t document them.

I recommend extending conversation_responses in utils.conversations to include 400 and 403 entries so OpenAPI reflects actual behavior.
src/utils/conversations.py (4)
59-66: OpenAPI example mismatch with model (message vs response).

ConversationDeleteResponse uses the field response, but the 200 example here uses message. This misleads API consumers.
 conversation_delete_responses: dict[int | str, dict[str, Any]] = {
         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
         "success": True,
-        "message": "Conversation deleted successfully",
+        "response": "Conversation deleted successfully",
     },
31-57: Enhance responses spec: document 400/403 for conversation retrieval.

The retrieval flow can emit:

400 for invalid ID (validate_conversation_id)

403 for unauthorized access (validate_conversation_access)
 conversation_responses: dict[int | str, dict[str, Any]] = {
         "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
         "chat_history": [
             {
                 "messages": [
                     {"content": "Hi", "type": "user"},
                     {"content": "Hello!", "type": "assistant"},
                 ],
                 "started_at": "2024-01-01T00:00:00Z",
                 "completed_at": "2024-01-01T00:00:05Z",
             }
         ],
     },
+    400: {
+        "detail": {
+            "response": "Invalid conversation ID format",
+            "cause": "Conversation ID is not a valid SUID/Response ID",
+        }
+    },
+    403: {
+        "detail": {
+            "response": "Access denied",
+            "cause": "You do not have permission to read this conversation",
+        }
+    },
         "detail": {
             "response": "Conversation not found",
             "cause": "The specified conversation ID does not exist.",
         }
     },
109-149: Guard against None output messages in simplified history.

Currently appends an output message even when output_message is missing, producing {"content": None, "type": None}.
-    # Extract only essential data from each turn
+    # Extract only essential data from each turn
     for turn in session_data.get("turns", []):
         # Clean up input messages
         cleaned_messages = []
         for msg in turn.get("input_messages", []):
             cleaned_msg = {
                 "content": msg.get("content"),
                 "type": msg.get("role"),  # Rename role to type
             }
             cleaned_messages.append(cleaned_msg)

         # Clean up output message
         output_msg = turn.get("output_message", {})
-        cleaned_messages.append(
-            {
-                "content": output_msg.get("content"),
-                "type": output_msg.get("role"),  # Rename role to type
-            }
-        )
+        if output_msg and (output_msg.get("content") is not None):
+            cleaned_messages.append(
+                {
+                    "content": output_msg.get("content"),
+                    "type": output_msg.get("role"),  # Rename role to type
+                }
+            )
210-221: Clarify 400 error message for non-UUID IDs.

Since check_suid accepts opaque resp-/resp_ IDs, the current cause text “not a valid UUID” can be misleading.
-                "cause": f"Conversation ID {conversation_id} is not a valid UUID",
+                "cause": f"Conversation ID {conversation_id} is not a valid SUID/Response ID",
src/app/endpoints/query.py (1)

116-120: Align upstream-connection status codes across endpoints.

handle_api_connection_error() raises 500, while conversation deletion maps APIConnectionError to 503. Pick one (prefer 503 Service Unavailable for upstream outages) and standardize.

Would you like me to open a follow-up to change handle_api_connection_error() to raise 503 and update OpenAPI docs accordingly?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b14d91c and c3d8748.

📒 Files selected for processing (13)

pyproject.toml (1 hunks)
src/app/endpoints/conversations.py (7 hunks)
src/app/endpoints/conversations_v2.py (1 hunks)
src/app/endpoints/query.py (6 hunks)
src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query.py (2 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (2 hunks)
src/constants.py (1 hunks)
src/utils/conversations.py (1 hunks)
src/utils/query.py (1 hunks)
src/utils/streaming_query.py (1 hunks)
src/utils/suid.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (10)

src/app/endpoints/streaming_query_v2.py (8)

src/auth/__init__.py (1)

get_auth_dependency (14-43)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/utils/endpoints.py (3)

check_configuration_loaded (62-68)

get_system_prompt (71-102)

validate_model_provider_override (105-125)

src/utils/query.py (5)

evaluate_model_hints (94-133)

is_transcripts_enabled (55-61)

persist_user_conversation_details (64-91)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

src/utils/streaming_query.py (3)

format_stream_data (26-37)

stream_start_event (40-61)

stream_end_event (64-100)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/query_v2.py (2)

get_rag_tools (274-283)

get_mcp_tools (286-304)

src/utils/streaming_query.py (4)

src/utils/types.py (2)

TurnSummary (59-78)

append_tool_calls_from_llama (65-78)

src/utils/endpoints.py (1)

get_system_prompt (71-102)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/app/endpoints/streaming_query.py (1)

response_generator (139-200)

src/utils/conversations.py (8)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/configuration.py (1)

configuration (60-64)

src/app/database.py (1)

get_session (34-40)

src/models/config.py (2)

config (130-136)

Action (303-342)

src/models/database/conversations.py (1)

UserConversation (11-36)

src/models/responses.py (4)

ConversationDeleteResponse (425-458)

ConversationsListResponse (493-550)

ConversationDetails (461-490)

UnauthorizedResponse (343-357)

src/utils/endpoints.py (3)

check_configuration_loaded (62-68)

delete_conversation (22-36)

validate_conversation_ownership (39-59)

src/utils/suid.py (1)

check_suid (15-44)

src/app/routers.py (1)

tests/unit/app/test_routers.py (1)

include_router (31-46)

src/app/endpoints/streaming_query.py (3)

src/utils/endpoints.py (2)

validate_model_provider_override (105-125)

validate_conversation_ownership (39-59)

src/utils/query.py (1)

select_model_and_provider_id (136-213)

src/utils/streaming_query.py (4)

format_stream_data (26-37)

stream_start_event (40-61)

stream_end_event (64-100)

stream_build_event (103-146)

src/app/endpoints/conversations_v2.py (5)

src/auth/__init__.py (1)

get_auth_dependency (14-43)

src/authorization/middleware.py (1)

authorize (111-122)

src/models/config.py (2)

config (130-136)

Action (303-342)

src/models/responses.py (3)

ConversationDeleteResponse (425-458)

ConversationResponse (375-422)

ConversationsListResponse (493-550)

src/utils/conversations.py (4)

get_conversations_list_base (151-207)

delete_conversation_base (262-360)

validate_conversation_id (210-220)

validate_conversation_access (223-259)

src/utils/query.py (9)

src/configuration.py (4)

configuration (60-64)

user_data_collection_configuration (81-85)

inference (121-125)

llama_stack_configuration (74-78)

src/app/database.py (1)

get_session (34-40)

src/models/config.py (2)

config (130-136)

Action (303-342)

src/models/database/conversations.py (1)

UserConversation (11-36)

src/models/requests.py (2)

QueryRequest (72-222)

Attachment (15-69)

src/models/responses.py (2)

UnauthorizedResponse (343-357)

ForbiddenResponse (360-372)

src/utils/endpoints.py (3)

check_configuration_loaded (62-68)

validate_conversation_ownership (39-59)

validate_model_provider_override (105-125)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (1)

TurnSummary (59-78)

src/app/endpoints/query_v2.py (8)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/models/requests.py (1)

QueryRequest (72-222)

src/models/responses.py (1)

QueryResponse (48-79)

src/utils/endpoints.py (3)

check_configuration_loaded (62-68)

get_system_prompt (71-102)

validate_model_provider_override (105-125)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/query.py (4)

evaluate_model_hints (94-133)

process_transcript_and_persist_conversation (352-383)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/query.py (1)

src/utils/query.py (8)

evaluate_model_hints (94-133)

select_model_and_provider_id (136-213)

is_input_shield (242-255)

is_output_shield (229-239)

validate_attachments_metadata (258-287)

validate_query_request (290-335)

handle_api_connection_error (338-349)

process_transcript_and_persist_conversation (352-383)

src/app/endpoints/conversations.py (3)

src/utils/conversations.py (5)

simplify_session_data (109-148)

get_conversations_list_base (151-207)

validate_conversation_id (210-220)

validate_conversation_access (223-259)

delete_conversation_base (262-360)

src/models/config.py (1)

Action (303-342)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

🔇 Additional comments (7)

src/constants.py (1)

5-5: Version cap bump looks good. Ensure CI exercises 0.2.21 explicitly.

src/app/routers.py (1)

17-20: Mounting V2 under /v2 while preserving V1 looks good.

Route composition is clear; ordering avoids conflicts. Please ensure unit tests assert presence of the new /v2 routes.

Also applies to: 41-44

src/app/endpoints/conversations.py (1)

42-43: LGTM: list endpoint delegates cleanly to shared base

Good reuse and RBAC is handled upstream.

src/app/endpoints/query_v2.py (1)

274-304: LGTM: tool adapters for Responses API

Straightforward mapping for RAG and MCP tools; debug logging is helpful.

src/app/endpoints/conversations_v2.py (1)

48-56: LGTM: list conversations v2 delegates to shared base cleanly.

src/app/endpoints/query.py (2)

63-68: LGTM: clear docstring about Agent API usage.

102-109: Nice: centralizing transcript persistence.

Good separation via process_transcript_and_persist_conversation() and consistent metrics updates.

pyproject.toml

coderabbitai · 2025-09-15T15:45:41Z

src/app/endpoints/conversations_v2.py

+async def get_response_sessions(_client, _conversation_id: str):
+    """Placeholder for Response API session listing.
+
+    For now, we rely on the shared base which works without direct
+    backend calls until a dedicated Responses API endpoint is wired.
+    """
+    return [
+        {"session_id": _conversation_id},
+    ]
+


⚠️ Potential issue

Fix: session helper functions must be synchronous (they’re called without await).

delete_conversation_base() invokes get_session_func synchronously. Defining these as async returns coroutines and breaks truthiness checks and deletion flow.

Apply this diff:

-async def get_response_sessions(_client, _conversation_id: str): +def get_response_sessions(_client, _conversation_id: str): """Placeholder for Response API session listing. For now, we rely on the shared base which works without direct backend calls until a dedicated Responses API endpoint is wired. """ return [ {"session_id": _conversation_id}, ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def get_response_sessions(_client, _conversation_id: str):

"""Placeholder for Response API session listing.

For now, we rely on the shared base which works without direct

backend calls until a dedicated Responses API endpoint is wired.

"""

return [

{"session_id": _conversation_id},

]

def get_response_sessions(_client, _conversation_id: str):

"""Placeholder for Response API session listing.

For now, we rely on the shared base which works without direct

backend calls until a dedicated Responses API endpoint is wired.

"""

return [

{"session_id": _conversation_id},

]

🤖 Prompt for AI Agents

In src/app/endpoints/conversations_v2.py around lines 32 to 41, the session helper is defined as async but is called synchronously by delete_conversation_base; change the function to a regular synchronous function (replace "async def" with "def"), keep the same return value (a list of dicts), update the docstring to remove async wording, and ensure no callers await this function so it returns a concrete list instead of a coroutine.

coderabbitai · 2025-09-15T15:45:41Z

src/app/endpoints/conversations_v2.py

+async def delete_response_sessions(_client, _conversation_id: str, _sessions):
+    """Placeholder for Response API session deletion."""
+    return None
+


⚠️ Potential issue

Same issue: make deletion helper synchronous.

Matches the calling convention in delete_conversation_base().

-async def delete_response_sessions(_client, _conversation_id: str, _sessions): +def delete_response_sessions(_client, _conversation_id: str, _sessions): """Placeholder for Response API session deletion.""" return None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def delete_response_sessions(_client, _conversation_id: str, _sessions):

"""Placeholder for Response API session deletion."""

return None

def delete_response_sessions(_client, _conversation_id: str, _sessions):

"""Placeholder for Response API session deletion."""

return None

🤖 Prompt for AI Agents

In src/app/endpoints/conversations_v2.py around lines 43 to 46, the helper delete_response_sessions is currently async but should be synchronous to match the calling convention in delete_conversation_base(); change the function to a regular def, remove async/await usage, and return None (or perform synchronous deletion logic) so callers expecting a synchronous function get a normal return value instead of a coroutine.

src/app/endpoints/conversations_v2.py

coderabbitai · 2025-09-15T15:45:41Z

src/app/endpoints/conversations.py

+    """Delete agent sessions for a conversation."""
+    agent_id = conversation_id
+    session_id = str(sessions[0].get("session_id"))
+    await client.agents.session.delete(agent_id=agent_id, session_id=session_id)


⚠️ Potential issue

Same issue when deleting sessions

Avoid .get on possibly typed objects.

- session_id = str(sessions[0].get("session_id")) + first = sessions[0] + session_id = first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None) + if not session_id: + raise RuntimeError("session_id missing on session") + session_id = str(session_id)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""Delete agent sessions for a conversation."""

agent_id = conversation_id

session_id = str(sessions[0].get("session_id"))

await client.agents.session.delete(agent_id=agent_id, session_id=session_id)

"""Delete agent sessions for a conversation."""

agent_id = conversation_id

first = sessions[0]

session_id = first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)

if not session_id:

raise RuntimeError("session_id missing on session")

session_id = str(session_id)

await client.agents.session.delete(agent_id=agent_id, session_id=session_id)

🤖 Prompt for AI Agents

In src/app/endpoints/conversations.py around lines 154 to 157, the code uses sessions[0].get("session_id") which relies on .get on a potentially typed object; replace this with explicit, type-safe access: first validate that sessions is a non-empty list and that sessions[0] is the expected type, then extract the session_id with the appropriate accessor (e.g., sessions[0]["session_id"] if it's a dict or sessions[0].session_id if it's an object), handle missing/invalid cases by raising a clear error or returning early, and pass the resulting stringified session_id to client.agents.session.delete.

src/utils/conversations.py

coderabbitai · 2025-09-15T15:45:42Z

src/utils/query.py

+def is_transcripts_enabled() -> bool:
+    """Check if transcripts is enabled.
+
+    Returns:
+        bool: True if transcripts is enabled, False otherwise.
+    """
+    return configuration.user_data_collection_configuration.transcripts_enabled
+


⚠️ Potential issue

Call configuration accessors; current code treats methods as attributes and will fail at runtime.

These are methods (see configuration.py). This will raise AttributeError.

- return configuration.user_data_collection_configuration.transcripts_enabled + return configuration.user_data_collection_configuration().transcripts_enabled

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def is_transcripts_enabled() -> bool:

"""Check if transcripts is enabled.

Returns:

bool: True if transcripts is enabled, False otherwise.

"""

return configuration.user_data_collection_configuration.transcripts_enabled

def is_transcripts_enabled() -> bool:

"""Check if transcripts is enabled.

Returns:

bool: True if transcripts is enabled, False otherwise.

"""

return configuration.user_data_collection_configuration().transcripts_enabled

🤖 Prompt for AI Agents

In src/utils/query.py around lines 55 to 62, the code treats configuration.user_data_collection_configuration.transcripts_enabled as an attribute but it's a method; change the return to call the method (e.g., return configuration.user_data_collection_configuration.transcripts_enabled()) so it invokes the accessor and returns the boolean; ensure the function signature/type hints remain bool and consider adding a defensive try/except or logging if configuration could be None or the call may raise.

coderabbitai · 2025-09-15T15:45:42Z

src/utils/query.py

+def select_model_and_provider_id(
+    models: ModelListResponse, model_id: str | None, provider_id: str | None
+) -> tuple[str, str, str]:
+    """
+    Select the model ID and provider ID based on the request or available models.
+
+    Determine and return the appropriate model and provider IDs for
+    a query request.
+
+    If the request specifies both model and provider IDs, those are used.
+    Otherwise, defaults from configuration are applied. If neither is
+    available, selects the first available LLM model from the provided model
+    list. Validates that the selected model exists among the available models.
+
+    Returns:
+        A tuple containing the combined model ID (in the format
+        "provider/model") and the provider ID.
+
+    Raises:
+        HTTPException: If no suitable LLM model is found or the selected model is not available.
+    """
+    # If model_id and provider_id are provided in the request, use them
+
+    # If model_id is not provided in the request, check the configuration
+    if not model_id or not provider_id:
+        logger.debug(
+            "No model ID or provider ID specified in request, checking configuration"
+        )
+        model_id = configuration.inference.default_model  # type: ignore[reportAttributeAccessIssue]
+        provider_id = (
+            configuration.inference.default_provider  # type: ignore[reportAttributeAccessIssue]
+        )
+
+    # If no model is specified in the request or configuration, use the first available LLM
+    if not model_id or not provider_id:
+        logger.debug(
+            "No model ID or provider ID specified in request or configuration, "
+            "using the first available LLM"
+        )
+        try:
+            model = next(
+                m
+                for m in models
+                if m.model_type == "llm"  # pyright: ignore[reportAttributeAccessIssue]
+            )
+            model_id = model.identifier
+            provider_id = model.provider_id
+            logger.info("Selected model: %s", model)
+            return model_id, model_id, provider_id
+        except (StopIteration, AttributeError) as e:
+            message = "No LLM model found in available models"
+            logger.error(message)
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail={
+                    "response": constants.UNABLE_TO_PROCESS_RESPONSE,
+                    "cause": message,
+                },
+            ) from e
+
+    llama_stack_model_id = f"{provider_id}/{model_id}"
+    # Validate that the model_id and provider_id are in the available models
+    logger.debug("Searching for model: %s, provider: %s", model_id, provider_id)
+    if not any(
+        m.identifier == llama_stack_model_id and m.provider_id == provider_id
+        for m in models
+    ):
+        message = f"Model {model_id} from provider {provider_id} not found in available models"
+        logger.error(message)
+        raise HTTPException(
+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail={
+                "response": constants.UNABLE_TO_PROCESS_RESPONSE,
+                "cause": message,
+            },
+        )
+
+    return llama_stack_model_id, model_id, provider_id
+


⚠️ Potential issue

Fix model/provider resolution: wrong early return tuple and improper config accessor usage.

Early return sets both first and second elements to the combined identifier.

Config accessors should be called.

- if not model_id or not provider_id: + if not model_id or not provider_id: logger.debug( "No model ID or provider ID specified in request, checking configuration" ) - model_id = configuration.inference.default_model # type: ignore[reportAttributeAccessIssue] - provider_id = ( - configuration.inference.default_provider # type: ignore[reportAttributeAccessIssue] - ) + model_id = configuration.inference().default_model + provider_id = configuration.inference().default_provider @@ - try: + try: model = next( m for m in models if m.model_type == "llm" # pyright: ignore[reportAttributeAccessIssue] ) - model_id = model.identifier - provider_id = model.provider_id - logger.info("Selected model: %s", model) - return model_id, model_id, provider_id + llama_stack_model_id = model.identifier + provider_id = model.provider_id + # Derive plain model id when identifier is "provider/model" + model_id = llama_stack_model_id.split("/", 1)[1] if "/" in llama_stack_model_id else llama_stack_model_id + logger.info("Selected model: %s", llama_stack_model_id) + return llama_stack_model_id, model_id, provider_id

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/utils/query.py around lines 136–214, the function improperly overwrites missing model/provider by unconditionally assigning config attributes without calling them, and returns the wrong tuple when selecting the first LLM (it returns model_id twice). Fix by calling the config accessors (e.g. configuration.inference.default_model() and configuration.inference.default_provider()) and only assign each of model_id or provider_id when that specific value is missing, then when selecting the first available LLM construct the combined id (provider/model) and return (combined_id, model.identifier, model.provider_id) so the returned tuple is (llama_stack_model_id, model_id, provider_id).

coderabbitai · 2025-09-15T15:45:43Z

src/utils/query.py

+    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+


⚠️ Potential issue

Avoid logging the full Llama Stack config; may include sensitive data. Also call the accessor.

Log only safe fields (e.g., version/provider list), not the entire object.

- logger.info("Llama stack config: %s", configuration.llama_stack_configuration) + cfg = configuration.llama_stack_configuration() + logger.info("Llama Stack version: %s; providers: %s", getattr(cfg, "version", "n/a"), getattr(cfg, "providers", "n/a"))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

cfg = configuration.llama_stack_configuration()

logger.info("Llama Stack version: %s; providers: %s", getattr(cfg, "version", "n/a"), getattr(cfg, "providers", "n/a"))

🤖 Prompt for AI Agents

In src/utils/query.py around lines 307-308, the code currently logs the entire Llama stack configuration object which may contain sensitive data; instead call the configuration accessor to extract and log only safe fields (for example: configuration.llama_stack_configuration.get_version() and configuration.llama_stack_configuration.get_providers() or equivalent accessors) and format a concise message like "Llama stack config: version=%s providers=%s" using those values; remove or replace the existing logger.info invocation so full object contents are not written to logs.

coderabbitai · 2025-09-15T15:45:43Z

src/utils/suid.py

+    """Check if given string is a proper session ID or response ID.

-    Args:
-        suid: The string to check.
-
-    Returns True if the string is a valid UUID, False otherwise.
+    Rules:
+    - If it starts with "resp-" or "resp_", accept as a valid Responses API ID (opaque).
+    - Otherwise, require a valid UUID string.
    """
+    if not isinstance(suid, str) or not suid:
+        return False
+
+    # Handle Responses API IDs
+    if suid.startswith("resp-") or suid.startswith("resp_"):
+        token = suid[5:]
+        if not token:
+            return False
+        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
+        if len(token) < 36:
+            token = token + ("0" * (36 - len(token)))
+        try:
+            uuid.UUID(token)
+            return True
+        except (ValueError, TypeError):
+            return False


⚠️ Potential issue

Don’t coerce opaque resp- IDs into UUIDs; the zero‑padding path is incorrect and unsafe.*

Response API IDs are opaque; padding/truncation can accept malformed IDs and reject valid ones.

def check_suid(suid: str) -> bool: - """Check if given string is a proper session ID or response ID. + """Check if given string is a proper session ID or response ID. @@ - if not isinstance(suid, str) or not suid: + if not isinstance(suid, str) or not suid: return False - - # Handle Responses API IDs - if suid.startswith("resp-") or suid.startswith("resp_"): - token = suid[5:] - if not token: - return False - # If truncated (e.g., shell cut reduced length), pad to canonical UUID length - if len(token) < 36: - token = token + ("0" * (36 - len(token))) - try: - uuid.UUID(token) - return True - except (ValueError, TypeError): - return False + # Handle Responses API IDs – treat as opaque, minimally validated + if suid.startswith(("resp-", "resp_")): + token = suid[5:] + # Require a non-empty token with sane charset + return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token) @@ try: uuid.UUID(suid) return True except (ValueError, TypeError): return False

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"""Check if given string is a proper session ID or response ID.

Args:

suid: The string to check.

Returns True if the string is a valid UUID, False otherwise.

Rules:

- If it starts with "resp-" or "resp_", accept as a valid Responses API ID (opaque).

- Otherwise, require a valid UUID string.

"""

if not isinstance(suid, str) or not suid:

return False

# Handle Responses API IDs

if suid.startswith("resp-") or suid.startswith("resp_"):

token = suid[5:]

if not token:

return False

# If truncated (e.g., shell cut reduced length), pad to canonical UUID length

if len(token) < 36:

token = token + ("0" * (36 - len(token)))

try:

uuid.UUID(token)

return True

except (ValueError, TypeError):

return False

def check_suid(suid: str) -> bool:

"""Check if given string is a proper session ID or response ID.

Rules:

- If it starts with "resp-" or "resp_", accept as a valid Responses API ID (opaque).

- Otherwise, require a valid UUID string.

"""

if not isinstance(suid, str) or not suid:

return False

# Handle Responses API IDs – treat as opaque, minimally validated

if suid.startswith(("resp-", "resp_")):

token = suid[5:]

# Require a non-empty token with sane charset

return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)

try:

uuid.UUID(suid)

return True

except (ValueError, TypeError):

return False

🤖 Prompt for AI Agents

In src/utils/suid.py around lines 16 to 37, the code currently treats resp- or resp_ IDs as if they were UUIDs by zero-padding truncated tokens and attempting uuid.UUID(token); remove that behavior and treat Responses API IDs as opaque: if suid is a non-empty string and starts with "resp-" or "resp_", simply return True when the token after the prefix is non-empty (do not pad or call uuid.UUID); for all other values keep the existing UUID validation path (i.e., require a valid UUID string). Ensure type/empty checks remain and do not introduce any coercion or padding logic.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/app/endpoints/query.py (3)
200-217: Security: avoid sending Bearer tokens to non‑HTTPS MCP servers by default.

The code forwards Authorization: Bearer <token> to every configured MCP server when headers are empty. This risks credential exposure if any server uses http:// or is misconfigured.

Apply:
+from urllib.parse import urlparse
@@
-        if not mcp_headers and token:
-            for mcp_server in configuration.mcp_servers:
-                mcp_headers[mcp_server.url] = {
-                    "Authorization": f"Bearer {token}",
-                }
+        if not mcp_headers and token:
+            for mcp_server in configuration.mcp_servers:
+                url = mcp_server.url
+                scheme = urlparse(url).scheme.lower()
+                if scheme != "https":
+                    logger.warning("Skipping non-HTTPS MCP server for auth header: %s", url)
+                    continue
+                mcp_headers[url] = {"Authorization": f"Bearer {token}"}
Optionally gate this behind a config flag (default off) and/or send tokens only to whitelisted servers that explicitly require auth.

178-182: Attachments privacy TODO is a compliance risk; prioritize redaction.

There’s a TODO to redact attachments before sending to the LLM. If attachments can include PII or secrets, add at least basic redaction or a feature flag to block unredacted uploads.

I can draft a pluggable redactor interface and a default regex‑based redactor for common PII (emails, phones, SSNs), gated by config.

54-56: Fix query_response to a valid FastAPI/OpenAPI responses mapping

src/utils/query.py defines query_response but several entries aren’t in FastAPI’s expected shape:

200 is a raw example body (conversation_id/response) instead of a response mapping — use "model": QueryResponse or "content": {"application/json": {"example": {...}}}.

503 uses a "detail" key with nested payload; replace with "description" and/or "content" (or a "model") to match FastAPI.

400/403 already include "model" (OK).

Location: src/utils/query.py — query_response definition.

♻️ Duplicate comments (6)

src/app/endpoints/conversations.py (3)

100-100: Typed vs dict session object: .get will fail at runtime.
client.agents.session.list(...) likely returns typed models, not dicts. Accessing via .get risks AttributeError. Use type-safe access with a fallback and validate presence.

Apply this diff:

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise RuntimeError("session_id missing on session")
+        session_id = str(session_id)

155-159: Same typed vs dict issue on delete; also consider multiple sessions.
Using .get may break; fix similarly. Optionally, delete all sessions for the conversation instead of just the first.

Apply this minimal fix:

-    session_id = str(sessions[0].get("session_id"))
-    await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
+    first = sessions[0]
+    session_id = (
+        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    )
+    if not session_id:
+        raise RuntimeError("session_id missing on session")
+    await client.agents.session.delete(agent_id=agent_id, session_id=str(session_id))

Optional improvement (delete all sessions):

-    first = sessions[0]
-    session_id = (
-        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
-    )
-    if not session_id:
-        raise RuntimeError("session_id missing on session")
-    await client.agents.session.delete(agent_id=agent_id, session_id=str(session_id))
+    for s in sessions:
+        sid = s.get("session_id") if isinstance(s, dict) else getattr(s, "session_id", None)
+        if not sid:
+            continue
+        await client.agents.session.delete(agent_id=agent_id, session_id=str(sid))

182-188: Awaiting a sync base + passing async helpers will crash.
delete_conversation_base is synchronous in utils.conversations, yet you await it and pass async helpers; this will raise at runtime and the async helpers won’t be awaited inside the base.

Preferred fix: make the base async and await the helpers; keep this call site as await.

Patch (in src/utils/conversations.py):

-def delete_conversation_base(
+async def delete_conversation_base(
     request: Request,
     conversation_id: str,
     auth: Any,
-    get_session_func: callable,
-    delete_session_func: callable,
+    get_session_func: callable,
+    delete_session_func: callable,
 ) -> ConversationDeleteResponse:
@@
-        sessions = get_session_func(client, conversation_id)
+        sessions = await get_session_func(client, conversation_id)
@@
-        delete_session_func(client, conversation_id, sessions)
+        await delete_session_func(client, conversation_id, sessions)

If you must keep the base sync, then do not await it here and instead provide sync wrappers that offload the async client calls, or run the entire base in a threadpool. Async base is cleaner.

Run this to confirm current signature and callers:

#!/bin/bash
rg -nC2 -P '\bdef\s+delete_conversation_base\s*\(' src | sed -n '1,120p'
rg -nP 'await\s+delete_conversation_base\(' src | sed -n '1,200p'

pyproject.toml (1)

31-32: Add upper bounds for llama-stack(+client) to match MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION (0.2.21).

Prevents unsupported upgrades that will be rejected at runtime by the version gate.
-    "llama-stack>=0.2.19",
-    "llama-stack-client>=0.2.19",
+    "llama-stack>=0.2.19,<=0.2.21",
+    "llama-stack-client>=0.2.19,<=0.2.21",

src/utils/suid.py (1)

16-37: Treat resp- IDs as opaque; remove UUID padding/coercion.*

Zero‑padding and UUID parsing can accept malformed IDs and reject valid ones.

 def check_suid(suid: str) -> bool:
@@
-    # Handle Responses API IDs
-    if suid.startswith("resp-") or suid.startswith("resp_"):
-        token = suid[5:]
-        if not token:
-            return False
-        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
-        if len(token) < 36:
-            token = token + ("0" * (36 - len(token)))
-        try:
-            uuid.UUID(token)
-            return True
-        except (ValueError, TypeError):
-            return False
+    # Handle Responses API IDs – treat as opaque
+    if suid.startswith(("resp-", "resp_")):
+        token = suid[5:]
+        return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)

src/app/endpoints/query_v2.py (1)

25-29: Validate conversation ownership before chaining via previous_response_id.

Prevents cross‑user context leakage; mirrors v1 parity.

 from utils.endpoints import (
     check_configuration_loaded,
     get_system_prompt,
     validate_model_provider_override,
+    validate_conversation_ownership,
 )
@@
-    user_conversation: UserConversation | None = None
-    if query_request.conversation_id:
-        # TODO: Implement conversation once Llama Stack supports its API
-        pass
+    user_conversation: UserConversation | None = None
+    if query_request.conversation_id:
+        user_conversation = validate_conversation_ownership(
+            user_id=user_id, conversation_id=query_request.conversation_id
+        )
+        if user_conversation is None:
+            raise HTTPException(
+                status_code=status.HTTP_403_FORBIDDEN,
+                detail={
+                    "response": "Access denied",
+                    "cause": "You do not have permission to access this conversation",
+                },
+            )

Also applies to: 79-83

🧹 Nitpick comments (11)

src/utils/streaming_query.py (2)

80-100: Use resilient keys for doc URL in end event.

Metadata may use "doc_url" or "docs_url". Accept both to avoid silent drops.

-                "referenced_documents": [
-                    {
-                        "doc_url": v["docs_url"],
-                        "doc_title": v["title"],
-                    }
-                    for v in filter(
-                        lambda v: ("docs_url" in v) and ("title" in v),
-                        metadata_map.values(),
-                    )
-                ],
+                "referenced_documents": [
+                    {
+                        "doc_url": v.get("docs_url") or v.get("doc_url"),
+                        "doc_title": v["title"],
+                    }
+                    for v in filter(
+                        lambda v: (("docs_url" in v) or ("doc_url" in v)) and ("title" in v),
+                        metadata_map.values(),
+                    )
+                ],

304-329: Handle dict-shaped tool_call deltas.

Some providers emit tool_call deltas as dicts; add a safe branch.

-        if chunk.event.payload.delta.type == "tool_call":
-            if isinstance(chunk.event.payload.delta.tool_call, str):
+        if chunk.event.payload.delta.type == "tool_call":
+            if isinstance(chunk.event.payload.delta.tool_call, str):
                 ...
-            elif isinstance(chunk.event.payload.delta.tool_call, ToolCall):
+            elif isinstance(chunk.event.payload.delta.tool_call, ToolCall):
                 ...
+            elif isinstance(chunk.event.payload.delta.tool_call, dict):
+                name = (
+                    chunk.event.payload.delta.tool_call.get("tool_name")
+                    or chunk.event.payload.delta.tool_call.get("function", {}).get("name")
+                    or "tool_call"
+                )
+                yield format_stream_data(
+                    {"event": "tool_call",
+                     "data": {"id": chunk_id, "role": chunk.event.payload.step_type, "token": name}}
+                )

src/app/endpoints/query_v2.py (4)

197-206: Guard previous_response_id to only pass resp- IDs (avoid mixing v1 UUIDs).*

Prevents API errors when a v1 conversation UUID is supplied.

-    response = await client.responses.create(
+    prev_resp_id = (
+        query_request.conversation_id
+        if query_request.conversation_id and query_request.conversation_id.startswith(("resp-", "resp_"))
+        else None
+    )
+    response = await client.responses.create(
         input=query_request.query,
         model=model_id,
         instructions=system_prompt,
-        previous_response_id=query_request.conversation_id,
+        previous_response_id=prev_resp_id,
         tools=tools if tools else None,
         stream=False,
         store=True,
     )

233-235: Lower response-content logging to DEBUG to reduce sensitive data in logs.

-                logger.info("Model response content: '%s'",
+                logger.debug("Model response content: '%s'",
                            llm_response[:200] + "..." if len(llm_response) > 200 else llm_response)

190-195: Propagate per-server MCP headers into tool definitions.

The dependency already provides headers; merge them with Authorization.

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers, token, mcp_headers)

-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict]:
+def get_mcp_tools(
+    mcp_servers: list,
+    token: str | None = None,
+    mcp_headers: dict[str, dict[str, str]] | None = None,
+) -> list[dict]:
@@
-        # Add authentication if token provided (Response API format)
-        if token:
-            tool_def["headers"] = {
-                "Authorization": f"Bearer {token}"
-            }
+        # Compose headers: Authorization + per-server forwarded headers
+        headers: dict[str, str] = {}
+        if token:
+            headers["Authorization"] = f"Bearer {token}"
+        if mcp_headers and mcp_server.name in mcp_headers:
+            headers.update(mcp_headers[mcp_server.name])
+        if headers:
+            tool_def["headers"] = headers

Also applies to: 284-303

6-11: Avoid runtime dependency on server-side types for casting.

Use TYPE_CHECKING to keep imports type-only and prevent version coupling with llama_stack server package at runtime.

-from llama_stack.apis.agents.openai_responses import (
-    OpenAIResponseObject,
-)
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:  # type-only to avoid runtime coupling
+    from llama_stack_client.types.responses import OpenAIResponseObject  # pragma: no cover
@@
-    response = cast(OpenAIResponseObject, response)
+    response = cast("OpenAIResponseObject", response)  # type: ignore[name-defined]

Also applies to: 207-208

src/app/endpoints/query.py (5)

85-90: Cache model listing to cut per‑request latency.

await client.models.list() on every call can be expensive. Add a short TTL cache (e.g., 30–60s) in the client holder or utils to avoid repeated network lists in hot paths.

91-99: Name clarity: pass stack model id as such.

You pass llama_stack_model_id into retrieve_response(model_id=...). Consider renaming the parameter in retrieve_response to stack_model_id (and the callsite) to avoid confusion with model_id used for metrics/transcripts.

158-165: Avoid double network call to list shields.

await client.shields.list() is invoked twice. Fetch once and derive both input/output lists.

Apply:
-    available_input_shields = [
-        shield.identifier
-        for shield in filter(is_input_shield, await client.shields.list())
-    ]
-    available_output_shields = [
-        shield.identifier
-        for shield in filter(is_output_shield, await client.shields.list())
-    ]
+    shields = await client.shields.list()
+    available_input_shields = [s.identifier for s in filter(is_input_shield, shields)]
+    available_output_shields = [s.identifier for s in filter(is_output_shield, shields)]
175-176: Don’t log full system prompts.

System prompts may include sensitive policy text or secrets. Log a hash or length, or truncate to a small prefix.

Apply:
-    logger.debug("Using system prompt: %s", system_prompt)
+    logger.debug("Using system prompt (len=%d, sha256=%s)", len(system_prompt), __import__("hashlib").sha256(system_prompt.encode()).hexdigest()[:12])
218-224: Confirm typing; avoid per-request vector DB listing.

Typing: Toolgroup = Union[str, ToolgroupAgentToolGroupWithArgs]; agent.create_turn accepts List[Toolgroup], so mixing plain toolgroup id strings and structured dicts is valid — no change required.

Performance: calling await client.vector_dbs.list() on every request (src/app/endpoints/query.py: lines 218–224) can add latency. Cache the vector DB IDs with a short TTL or fetch them only when RAG is enabled/needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3d8748 and 692ede7.

📒 Files selected for processing (13)

pyproject.toml (1 hunks)
src/app/endpoints/conversations.py (7 hunks)
src/app/endpoints/conversations_v2.py (1 hunks)
src/app/endpoints/query.py (6 hunks)
src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query.py (2 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (2 hunks)
src/constants.py (1 hunks)
src/utils/conversations.py (1 hunks)
src/utils/query.py (1 hunks)
src/utils/streaming_query.py (1 hunks)
src/utils/suid.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

src/app/routers.py
src/utils/conversations.py
src/utils/query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/conversations_v2.py

🧰 Additional context used

🧬 Code graph analysis (4)

src/utils/streaming_query.py (4)

src/utils/types.py (2)

TurnSummary (59-78)

append_tool_calls_from_llama (65-78)

src/utils/endpoints.py (1)

get_system_prompt (71-111)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/app/endpoints/streaming_query.py (1)

response_generator (140-201)

src/app/endpoints/query_v2.py (6)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/configuration.py (3)

configuration (60-64)

llama_stack_configuration (74-78)

mcp_servers (88-92)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/utils/endpoints.py (3)

check_configuration_loaded (62-68)

get_system_prompt (71-111)

validate_model_provider_override (114-134)

src/utils/query.py (4)

evaluate_model_hints (94-133)

process_transcript_and_persist_conversation (352-383)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/query.py (6)

src/models/requests.py (1)

QueryRequest (72-222)

src/models/responses.py (1)

QueryResponse (48-79)

src/utils/endpoints.py (2)

get_agent (138-187)

get_system_prompt (71-111)

src/utils/mcp_headers.py (2)

mcp_headers_dependency (15-26)

handle_mcp_headers_with_toolgroups (56-90)

src/utils/types.py (1)

TurnSummary (59-78)

src/utils/query.py (8)

evaluate_model_hints (94-133)

select_model_and_provider_id (136-213)

is_input_shield (242-255)

is_output_shield (229-239)

validate_attachments_metadata (258-287)

validate_query_request (290-335)

handle_api_connection_error (338-349)

process_transcript_and_persist_conversation (352-383)

src/app/endpoints/conversations.py (3)

src/utils/conversations.py (5)

simplify_session_data (109-148)

get_conversations_list_base (151-207)

validate_conversation_id (210-220)

validate_conversation_access (223-259)

delete_conversation_base (262-360)

src/models/config.py (1)

Action (305-344)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

🔇 Additional comments (12)

src/app/endpoints/conversations.py (9)

1-1: Docstring update aligns with Agent API usage.
Clear and accurate.

21-30: Good centralization on utils.conversations.
Imports look correct and promote reuse.

55-55: Docstring clarifications look good.
Accurately describes Agent API flow and error semantics.

Also applies to: 58-58

72-72: Early ID validation is correct.
Good to fail fast on malformed IDs.

76-81: Access check is consistent with READ_OTHERS policy.
Looks correct given Action.READ_OTHERS_CONVERSATIONS.

84-84: Logging is fine.
Message is informative and scoped.

89-89: (no functional change)

149-153: Helper is correct and concise.
Returns the raw .data which aligns with usage.

172-177: Deletion docstring is accurate.
Matches the behavior and mapped error codes.

src/constants.py (1)

5-5: Version cap aligns with the proposed dependency bounds.

LGTM. Once pyproject upper bounds are added, this gate will be consistent.

src/app/endpoints/query.py (2)

100-101: No change needed — label order is correct.

llm_calls_total is declared with labels ["provider","model"] in src/metrics/init.py and call sites pass (provider_id, model_id) in the endpoints — no changes required.

77-81: Avoid positional unpacking of AuthTuple; use named access or a helper.

Replace "_, _, _, token = auth" with a named attribute (e.g., auth.token) or a small helper that safely extracts the token — positional unpacking is brittle if AuthTuple changes. Location: src/app/endpoints/query.py (lines 77–81). Verification attempt failed: running rg for "AuthTuple" returned "authentication: No such file or directory (os error 2)"; AuthTuple definition not found in repo — confirm its shape and update the unpacking accordingly.

coderabbitai · 2025-09-16T06:26:06Z

src/app/endpoints/conversations.py

-                    "cause": f"Unknown error while getting conversations for user {user_id}",
-                },
-            ) from e
+    return get_conversations_list_base(request, auth)


⚠️ Potential issue

Avoid blocking the event loop when calling a sync base.
get_conversations_list_base is synchronous and does DB I/O; calling it directly from an async endpoint will block the event loop. Run it in a threadpool.

Apply this diff:

@@ -from fastapi import APIRouter, HTTPException, Request, status, Depends +from fastapi import APIRouter, HTTPException, Request, status, Depends +from starlette.concurrency import run_in_threadpool @@ - return get_conversations_list_base(request, auth) + return await run_in_threadpool(get_conversations_list_base, request, auth)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

return get_conversations_list_base(request, auth)

from fastapi import APIRouter, HTTPException, Request, status, Depends

from starlette.concurrency import run_in_threadpool

return await run_in_threadpool(get_conversations_list_base, request, auth)

🤖 Prompt for AI Agents

In src/app/endpoints/conversations.py around line 44, the endpoint currently calls the synchronous get_conversations_list_base directly which blocks the event loop; change the call to run the sync DB I/O in a threadpool (use asyncio.to_thread or loop.run_in_executor) and await its result instead. Import asyncio if needed, replace the direct return with an awaited asyncio.to_thread(get_conversations_list_base, request, auth) (or equivalent run_in_executor call) so the blocking work executes off the event loop and the endpoint remains async.

coderabbitai · 2025-09-16T06:26:07Z

src/app/endpoints/query_v2.py

+    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+


⚠️ Potential issue

Avoid logging full Llama Stack configuration at INFO (may expose secrets/URLs).

Log only mode or move to DEBUG without printing the object.

- logger.info("Llama stack config: %s", configuration.llama_stack_configuration) + logger.debug( + "Llama Stack configured (mode=%s)", + "library" if configuration.llama_stack_configuration.use_as_library_client else "service", + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

logger.debug(

"Llama Stack configured (mode=%s)",

"library" if configuration.llama_stack_configuration.use_as_library_client else "service",

)

🤖 Prompt for AI Agents

In src/app/endpoints/query_v2.py around lines 75-76, the current INFO log prints the entire Llama stack configuration which may expose secrets/URLs; change this to either logger.debug(...) instead of logger.info(...) or log only non-sensitive fields (e.g., configuration.llama_stack_configuration.mode or an explicit safe subset), and ensure any sensitive fields are omitted or masked before logging.

coderabbitai · 2025-09-16T06:26:07Z

src/utils/streaming_query.py

+    if hasattr(chunk, "error"):
+        yield from _handle_error_event(chunk, chunk_id)
+
+    event_type = chunk.event.payload.event_type
+    step_type = getattr(chunk.event.payload, "step_type", None)
+
+    match (event_type, step_type):


⚠️ Potential issue

Error chunks should short‑circuit to avoid AttributeError.

After yielding the error event, the function continues and accesses chunk.event, which may not exist on error chunks.

- if hasattr(chunk, "error"): - yield from _handle_error_event(chunk, chunk_id) + if hasattr(chunk, "error"): + yield from _handle_error_event(chunk, chunk_id) + return

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/utils/streaming_query.py around lines 123 to 129, after yielding the error event the function continues and then accesses chunk.event which may not exist on error chunks; modify the control flow so that once hasattr(chunk, "error") yields the error (yield from _handle_error_event(...)) the function short‑circuits (e.g., return or continue the loop) to avoid further access to chunk.event, ensuring error chunks do not execute the subsequent event_type/step_type logic.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/unit/app/endpoints/test_query.py (1)

113-124: Patch mock target to utils.query.configuration (not app.endpoints.query.configuration)

Replace all occurrences in tests/unit/app/endpoints/test_query.py (lines: 121, 129, 151, 446, 478, 511, 550, 600, 653, 708, 757, 807, 876, 946, 1008, 1105, 1176, 1232, 1283, 1342, 1393, 1559): change
mocker.patch("app.endpoints.query.configuration", ...) -> mocker.patch("utils.query.configuration", ...)

src/app/endpoints/query.py (1)

246-248: Token metrics label mismatch — normalize model label (no provider prefix) across metrics.

Observed mismatch: query.py derives model_label by splitting "provider/model" and passes (model_label, provider_id) to update_llm_token_count_from_turn, but src/metrics/utils.py uses model.identifier directly when registering provider/model metrics, and src/utils/query.py sometimes returns inconsistent values from select_model_and_provider_id.

Problem locations: src/app/endpoints/query.py (lines ~244–248), src/metrics/utils.py (update_llm_token_count_from_turn & setup_model_metrics), src/utils/query.py (select_model_and_provider_id).

Impact: metrics.provider_model_configuration and runtime token metrics may use different label keys (e.g., ("provider","provider/model") vs ("provider","model")), causing scattered/incorrect metric counts.

Fixes:

In src/metrics/utils.py, normalize model_name before labeling: model_name = model.identifier.split("/", 1)[1] if "/" in model.identifier else model.identifier.

In src/utils/query.py, ensure select_model_and_provider_id always returns a consistent tuple: (llama_stack_model_id as "provider/model", plain_model_name, provider_id) instead of returning model_id twice in the first-LLM branch.

Alternatively, add a single helper (e.g., utils.query.get_model_label(llama_stack_model_id)) and use it everywhere to compute the metric label.

♻️ Duplicate comments (12)

src/app/endpoints/conversations.py (4)

37-45: Don’t block the event loop: run the sync base in a threadpool

get_conversations_list_base is synchronous and does DB I/O. Calling it directly from an async route will block.

Apply:

-    return get_conversations_list_base(request, auth)
+    return await run_in_threadpool(get_conversations_list_base, request, auth)

Add import (outside this hunk):

from starlette.concurrency import run_in_threadpool

92-105: Type-safe session_id extraction (avoid .get on typed SDK models)

SDK objects are typed; .get may raise at runtime. Validate and handle missing IDs explicitly.

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail={"response": "Invalid session", "cause": "session_id missing"},
+            )
+        session_id = str(session_id)

#!/bin/bash
# Find any remaining fragile `.get("session_id")` usages across the repo.
rg -nP -C2 '\.get\(\s*[\"\']session_id[\"\']\s*\)' || true

187-193: LGTM: async base awaited correctly

Awaiting delete_conversation_base with async helpers resolves the earlier await-mismatch.

158-165: Same typed-access bug in delete helper

Avoid .get; validate session_id before delete.

-    session_id = str(sessions[0].get("session_id"))
-    await client.agents.session.delete(agent_id=agent_id, session_id=session_id)
+    first = sessions[0]
+    session_id = (
+        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    )
+    if not session_id:
+        raise RuntimeError("session_id missing on session")
+    await client.agents.session.delete(agent_id=agent_id, session_id=str(session_id))

src/utils/query.py (4)

55-62: BUG: configuration accessors must be called (will raise at runtime).

Use configuration.user_data_collection_configuration() accessor before reading transcripts_enabled.

Apply:

-    return configuration.user_data_collection_configuration.transcripts_enabled
+    return configuration.user_data_collection_configuration().transcripts_enabled

158-169: BUG: Don’t overwrite provided model/provider; call config accessors.

Current branch assigns both defaults when either is missing and uses attributes instead of accessors.

Apply:

-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request, checking configuration"
         )
-        model_id = configuration.inference.default_model  # type: ignore[reportAttributeAccessIssue]
-        provider_id = (
-            configuration.inference.default_provider  # type: ignore[reportAttributeAccessIssue]
-        )
+        if not model_id:
+            model_id = configuration.inference().default_model
+        if not provider_id:
+            provider_id = configuration.inference().default_provider

175-185: BUG: Wrong tuple and identifiers when auto‑selecting first LLM.

Return value duplicates model_id and doesn’t derive plain model id from provider/model identifier.

Apply:

-            model = next(
+            model = next(
                 m
                 for m in models
                 if m.model_type == "llm"  # pyright: ignore[reportAttributeAccessIssue]
             )
-            model_id = model.identifier
-            provider_id = model.provider_id
-            logger.info("Selected model: %s", model)
-            return model_id, model_id, provider_id
+            llama_stack_model_id = model.identifier
+            provider_id = model.provider_id
+            model_id = (
+                llama_stack_model_id.split("/", 1)[1]
+                if "/" in llama_stack_model_id
+                else llama_stack_model_id
+            )
+            logger.info("Selected model: %s", llama_stack_model_id)
+            return llama_stack_model_id, model_id, provider_id

307-308: Don’t log entire Llama Stack config; call accessor and log safe fields.

Full config may include secrets; also needs accessor call.

Apply:

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

src/app/endpoints/streaming_query_v2.py (1)

308-308: SSE headers missing (clients may buffer or misinterpret stream).

Return StreamingResponse with proper media type and no‑buffering headers.

Apply:

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

src/utils/streaming_query.py (1)

123-129: Short-circuit on error chunks to avoid AttributeError.

After yielding the error event, the function still accesses chunk.event.* which may not exist on error chunks.
     if hasattr(chunk, "error"):
-        yield from _handle_error_event(chunk, chunk_id)
+        yield from _handle_error_event(chunk, chunk_id)
+        return

src/app/endpoints/query_v2.py (2)

66-69: Avoid logging full Llama Stack configuration at INFO in validate_query_request.

validate_query_request currently logs the entire configuration object at INFO, which risks leaking secrets/URLs. Downgrade to DEBUG and log only safe fields.

Proposed change in src/utils/query.py:
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    logger.debug(
+        "Llama Stack configured (mode=%s)",
+        "library"
+        if configuration.llama_stack_configuration.use_as_library_client
+        else "service",
+    )
226-235: Don’t log user/model content at INFO.

Even truncated, response text can contain sensitive data. Move to DEBUG.
-                logger.info(
+                logger.debug(
                     "Model response content: '%s'",
                     (
                         llm_response[:200] + "..."
                         if len(llm_response) > 200
                         else llm_response
                     ),
                 )

🧹 Nitpick comments (18)

src/app/endpoints/conversations.py (3)
92-101: 404 isn’t an error condition; lower log level

“No sessions found” maps to 404; use warning to reduce error-noise.
-            logger.error("No sessions found for conversation %s", conversation_id)
+            logger.warning("No sessions found for conversation %s", conversation_id)
152-156: Fix return type: client.agents.session.list(...).data returns typed models, not dicts

Current annotation misleads and encourages dict-style access.
-async def get_agent_sessions(client: Any, conversation_id: str) -> list[dict[str, Any]]:
+async def get_agent_sessions(client: Any, conversation_id: str) -> list[Any]:
92-105: Prefer latest session, not arbitrary first

If multiple sessions exist, select the most recent by created_at (if available).
# before picking first:
sessions_sorted = sorted(
    agent_sessions,
    key=lambda s: (getattr(s, "created_at", None) or (s.get("created_at") if isinstance(s, dict) else None)) or "",
    reverse=True,
)
first = sessions_sorted[0]
src/utils/query.py (4)
150-156: Docstring is inaccurate (function returns 3 values).

Update to document (llama_stack_model_id, model_id, provider_id).

Apply:
-    Returns:
-        A tuple containing the combined model ID (in the format
-        "provider/model") and the provider ID.
+    Returns:
+        tuple[str, str, str]: (llama_stack_model_id "provider/model",
+        model_id, provider_id).
242-256: Tighten shield classification; current logic labels unknowns as input.

Use explicit prefix checks; avoid “not output ⇒ input”.

Apply:
-def is_input_shield(shield: Shield) -> bool:
+def is_input_shield(shield: Shield) -> bool:
@@
-    return _is_inout_shield(shield) or not is_output_shield(shield)
+    return _is_inout_shield(shield) or shield.identifier.startswith("input_")
33-52: Align OpenAPI response codes with behavior.

Docs list 400/403/503, while handlers raise 401/500 in places. Consider 401 for UnauthorizedResponse, and 500 (or 503 consistently) for upstream connectivity.

Apply (example):
-    400: {
-        "description": "Missing or invalid credentials provided by client",
-        "model": UnauthorizedResponse,
-    },
+    401: {
+        "description": "Missing or invalid credentials provided by client",
+        "model": UnauthorizedResponse,
+    },
@@
-    503: {
+    500: {
         "detail": {
             "response": "Unable to connect to Llama Stack",
             "cause": "Connection error.",
         }
     },
290-299: Minor: Drop Depends annotation in utility signature.

This is not a FastAPI endpoint; the Depends marker is misleading.

Apply:
-def validate_query_request(
+def validate_query_request(
     request: Request,
     query_request: QueryRequest,
-    auth: Annotated[AuthTuple, Depends],
+    auth: AuthTuple,
 ) -> tuple[str, UserConversation | None, str]:
tests/unit/app/test_routers.py (2)
67-67: Avoid brittle magic number for router count.

Asserting exact length will break when routers are added/removed. Prefer asserting presence only.

Apply:
-    assert len(app.routers) == 14
+    # avoid brittle count; presence checks below suffice
91-91: Same here: drop hardcoded router count.

Apply:
-    assert len(app.routers) == 14
+    # avoid brittle count; verify prefixes instead
tests/unit/app/endpoints/test_conversations.py (1)

273-274: Be consistent about patch targets.

You patch utils.conversations.check_suid here but configuration is patched under app.endpoints.conversations elsewhere. Prefer a single module boundary (utils.conversations) to reduce surprises.
tests/unit/app/endpoints/test_streaming_query_v2.py (1)
156-166: Consider asserting SSE headers.

Add checks for media_type text/event-stream and no-buffering headers to catch regressions.

Example:
assert resp.media_type == "text/event-stream"
assert resp.headers.get("X-Accel-Buffering") == "no"
src/app/endpoints/streaming_query_v2.py (2)
329-396: MCP headers param is unused; either plumb it or drop it.

You accept mcp_headers but never apply it to tool definitions.

Option A: merge headers per server URL:
-        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token, mcp_headers=mcp_headers)
…paired with updating get_mcp_tools signature to merge provided headers:
-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict[str, Any]]:
+def get_mcp_tools(mcp_servers: list, token: str | None = None, mcp_headers: dict[str, dict[str, str]] | None = None) -> list[dict[str, Any]]:
@@
-        if token:
-            tool_def["headers"] = {"Authorization": f"Bearer {token}"}
+        headers = {}
+        if token:
+            headers["Authorization"] = f"Bearer {token}"
+        if mcp_headers:
+            headers.update(mcp_headers.get(mcp_server.url, {}))
+        if headers:
+            tool_def["headers"] = headers
Option B: remove mcp_headers from signatures to avoid dead parameters (less flexible).

12-15: Nit: unused import “status” from starlette.

Remove if not used.
-from starlette import status
tests/unit/app/endpoints/test_query.py (1)
237-243: Wrong default model/provider patch target.

select_model_and_provider_id reads defaults from utils.query.configuration, not metrics.utils.configuration. Patch the correct target so configuration-driven tests behave as intended.
-    mocker.patch(
-        "metrics.utils.configuration.inference.default_provider",
-        "default_provider",
-    )
-    mocker.patch(
-        "metrics.utils.configuration.inference.default_model",
-        "default_model",
-    )
+    mocker.patch("utils.query.configuration.inference.default_provider", "default_provider")
+    mocker.patch("utils.query.configuration.inference.default_model", "default_model")
Also applies to: 277-283
src/utils/streaming_query.py (1)
23-24: Make METADATA_PATTERN non-greedy and EOL-tolerant.

Current pattern can over-capture and requires a trailing newline. Use a non-greedy match and optional EOL.
-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
+METADATA_PATTERN = re.compile(r"\nMetadata:\s*(\{.*?\})(?:\n|$)")
src/app/endpoints/conversations_v2.py (1)
27-29: Nit: use module-specific logger name for filtering.

Helps log routing and ops.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger("app.endpoints.conversations_v2")
src/app/endpoints/streaming_query.py (1)
62-63: Remove unused METADATA_PATTERN.

SSE parsing moved to utils.streaming_query; this constant is now dead.
-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
src/app/endpoints/query_v2.py (1)
124-131: Wire mcp_headers into Responses API tool headers.

mcp_headers is accepted by the endpoint but unused in retrieve_response(). Merge provided headers with token-derived headers so callers can scope per-server credentials.
@@
-async def retrieve_response(  # pylint: disable=too-many-locals,too-many-branches
+async def retrieve_response(  # pylint: disable=too-many-locals,too-many-branches
@@
-    mcp_headers: dict[str, dict[str, str]] | None = None,
+    mcp_headers: dict[str, dict[str, str]] | None = None,
@@
-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers, token, mcp_headers)
         if mcp_tools:
             tools.extend(mcp_tools)
@@
-def get_mcp_tools(mcp_servers: list, token: str | None = None) -> list[dict[str, Any]]:
+def get_mcp_tools(
+    mcp_servers: list, token: str | None = None, extra_headers: dict[str, dict[str, str]] | None = None
+) -> list[dict[str, Any]]:
@@
-    for mcp_server in mcp_servers:
+    for mcp_server in mcp_servers:
         tool_def = {
             "type": "mcp",
             "server_label": mcp_server.name,
             "server_url": mcp_server.url,
             "require_approval": "never",
         }
 
-        # Add authentication if token provided (Response API format)
-        if token:
-            tool_def["headers"] = {"Authorization": f"Bearer {token}"}
+        # Merge auth from explicit headers and/or token
+        headers: dict[str, str] = {}
+        if extra_headers and (h := extra_headers.get(mcp_server.url) or extra_headers.get(mcp_server.name)):
+            headers.update(h)
+        if token and "Authorization" not in headers:
+            headers["Authorization"] = f"Bearer {token}"
+        if headers:
+            tool_def["headers"] = headers
Also applies to: 176-185, 312-329

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 692ede7 and 61301f7.

📒 Files selected for processing (18)

src/app/endpoints/conversations.py (7 hunks)
src/app/endpoints/conversations_v2.py (1 hunks)
src/app/endpoints/query.py (5 hunks)
src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query.py (2 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (3 hunks)
src/constants.py (1 hunks)
src/utils/conversations.py (1 hunks)
src/utils/query.py (1 hunks)
src/utils/streaming_query.py (1 hunks)
src/utils/suid.py (1 hunks)
tests/unit/app/endpoints/test_conversations.py (14 hunks)
tests/unit/app/endpoints/test_query.py (7 hunks)
tests/unit/app/endpoints/test_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (5 hunks)
tests/unit/conftest.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/utils/suid.py
src/app/routers.py
src/utils/conversations.py

🧰 Additional context used

🧬 Code graph analysis (11)

tests/unit/app/endpoints/test_streaming_query_v2.py (4)

src/models/requests.py (1)

QueryRequest (72-222)

src/models/config.py (3)

config (132-138)

ModelContextProtocolServer (161-166)

Action (305-344)

src/app/endpoints/streaming_query_v2.py (1)

streaming_query_endpoint_handler_v2 (59-321)

src/configuration.py (1)

mcp_servers (88-92)

tests/unit/app/endpoints/test_query_v2.py (4)

src/models/requests.py (2)

QueryRequest (72-222)

Attachment (15-69)

src/models/config.py (2)

config (132-138)

ModelContextProtocolServer (161-166)

src/app/endpoints/query_v2.py (3)

get_rag_tools (298-309)

get_mcp_tools (312-328)

query_endpoint_handler_v2 (45-116)

src/configuration.py (1)

mcp_servers (88-92)

src/app/endpoints/conversations_v2.py (5)

src/authentication/__init__.py (1)

get_auth_dependency (14-43)

src/authorization/middleware.py (1)

authorize (111-122)

src/models/config.py (2)

config (132-138)

Action (305-344)

src/models/responses.py (3)

ConversationDeleteResponse (443-476)

ConversationResponse (393-440)

ConversationsListResponse (540-597)

src/utils/conversations.py (4)

get_conversations_list_base (150-206)

delete_conversation_base (259-357)

validate_conversation_id (209-219)

validate_conversation_access (222-256)

src/app/endpoints/conversations.py (5)

src/models/config.py (2)

config (132-138)

Action (305-344)

src/models/responses.py (3)

ConversationResponse (393-440)

ConversationDeleteResponse (443-476)

ConversationsListResponse (540-597)

src/utils/endpoints.py (1)

check_configuration_loaded (62-68)

src/utils/conversations.py (5)

simplify_session_data (108-147)

get_conversations_list_base (150-206)

validate_conversation_id (209-219)

validate_conversation_access (222-256)

delete_conversation_base (259-357)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/app/endpoints/streaming_query_v2.py (14)

src/authentication/__init__.py (1)

get_auth_dependency (14-43)

src/authorization/middleware.py (1)

authorize (111-122)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/configuration.py (3)

configuration (60-64)

llama_stack_configuration (74-78)

mcp_servers (88-92)

src/models/config.py (2)

config (132-138)

Action (305-344)

src/models/database/conversations.py (1)

UserConversation (11-36)

src/models/requests.py (1)

QueryRequest (72-222)

src/utils/endpoints.py (4)

check_configuration_loaded (62-68)

get_system_prompt (71-111)

validate_model_provider_override (114-134)

validate_conversation_ownership (39-59)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/query.py (5)

evaluate_model_hints (94-133)

is_transcripts_enabled (55-61)

persist_user_conversation_details (64-91)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

src/utils/streaming_query.py (3)

format_stream_data (26-37)

stream_start_event (40-61)

stream_end_event (64-100)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/query_v2.py (2)

get_rag_tools (298-309)

get_mcp_tools (312-328)

tests/unit/app/endpoints/test_query.py (1)

src/utils/query.py (2)

is_transcripts_enabled (55-61)

select_model_and_provider_id (136-213)

src/app/endpoints/query_v2.py (6)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/models/requests.py (1)

QueryRequest (72-222)

src/models/responses.py (1)

QueryResponse (48-79)

src/utils/endpoints.py (1)

get_system_prompt (71-111)

src/utils/query.py (4)

evaluate_model_hints (94-133)

process_transcript_and_persist_conversation (352-383)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/streaming_query.py (3)

src/app/endpoints/query.py (1)

get_rag_toolgroups (266-293)

src/utils/endpoints.py (3)

get_system_prompt (71-111)

validate_model_provider_override (114-134)

validate_conversation_ownership (39-59)

src/utils/streaming_query.py (3)

stream_start_event (40-61)

stream_end_event (64-100)

stream_build_event (103-146)

src/utils/query.py (8)

src/configuration.py (4)

configuration (60-64)

user_data_collection_configuration (81-85)

inference (121-125)

llama_stack_configuration (74-78)

src/app/database.py (1)

get_session (34-40)

src/models/database/conversations.py (1)

UserConversation (11-36)

src/models/requests.py (2)

QueryRequest (72-222)

Attachment (15-69)

src/models/responses.py (2)

UnauthorizedResponse (361-375)

ForbiddenResponse (378-390)

src/utils/endpoints.py (3)

check_configuration_loaded (62-68)

validate_conversation_ownership (39-59)

validate_model_provider_override (114-134)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (1)

TurnSummary (59-78)

src/utils/streaming_query.py (4)

src/utils/types.py (2)

TurnSummary (59-78)

append_tool_calls_from_llama (65-78)

src/utils/endpoints.py (1)

get_system_prompt (71-111)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/app/endpoints/streaming_query.py (1)

response_generator (142-203)

src/app/endpoints/query.py (7)

src/authentication/__init__.py (1)

get_auth_dependency (14-43)

src/client.py (1)

AsyncLlamaStackClientHolder (18-55)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/models/requests.py (1)

QueryRequest (72-222)

src/models/responses.py (1)

QueryResponse (48-79)

src/utils/endpoints.py (2)

get_agent (138-187)

get_system_prompt (71-111)

src/utils/query.py (8)

evaluate_model_hints (94-133)

select_model_and_provider_id (136-213)

is_input_shield (242-255)

is_output_shield (229-239)

validate_attachments_metadata (258-287)

validate_query_request (290-335)

handle_api_connection_error (338-349)

process_transcript_and_persist_conversation (352-383)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: e2e_tests

🔇 Additional comments (13)

tests/unit/conftest.py (1)

26-36: LGTM: stable Turn mock prevents “did not complete” errors.

This makes downstream tests resilient to missing output_message.

tests/unit/app/test_routers.py (1)

21-24: LGTM: v2 routers included in import surface.

tests/unit/app/endpoints/test_conversations.py (1)

79-80: LGTM: context‑managed get_session mock is correct.

Matches production usage of get_session() with a context manager.

tests/unit/app/endpoints/test_query_v2.py (2)

57-82: LGTM: no-tools path verifies tools=None and prompt wiring.

118-157: LGTM: output parsing and tool call capture looks robust.

src/utils/query.py (1)

1-384: Repo-wide sweep for configuration accessor misuse — verify and fix accesses.

Found non-call attribute accesses to the configuration object; confirm each is a true attribute (leave) or should be invoked (call) and update accordingly:

src/utils/query.py:307 — logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

src/lightspeed_stack.py:62 — logger.info("Configuration: %s", configuration.configuration)

src/lightspeed_stack.py:64 — logger.info("Llama stack configuration: %s", configuration.llama_stack_configuration)

src/app/endpoints/streaming_query.py:100 — logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

src/app/endpoints/streaming_query_v2.py:89 — logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

If any referenced symbol is a callable (e.g., an accessor method), replace the attribute access with a call and adjust call sites; otherwise keep as-is.

tests/unit/app/endpoints/test_query.py (2)

95-101: Good: configuration-not-loaded path tested via utils.query.configuration None.

This aligns with the new utils-based organization. No changes needed.

62-64: Mocks now point to centralized utils — nice.

Redirecting ownership and persistence to utils.* keeps tests aligned with the refactor.

src/app/endpoints/conversations_v2.py (2)

32-43: Async placeholders match awaited base signature.

delete_conversation_base awaits the session helpers; keeping these async is correct.

85-101: Correct: don’t await a sync base — base is async now, so awaiting is right.

Past comments to drop await are obsolete given utils.conversations.delete_conversation_base is async.

src/app/endpoints/streaming_query.py (1)

151-186: SSE generation correctly delegates to shared utilities.

Centralization reduces drift between v1/v2. Looks good.

src/app/endpoints/query.py (2)

100-107: Good consolidation: transcript + conversation persistence via utils.

Reduces duplication and keeps v1/v2 behavior aligned.

191-223: Toolgroups assembly is correct and preserves no_tools semantics.

MCP headers are merged; empty toolgroups coerced to None. LGTM.

coderabbitai · 2025-09-17T14:00:41Z

src/app/endpoints/query_v2.py

+    # Create OpenAI response using responses API
+    response = await client.responses.create(
+        input=query_request.query,
+        model=model_id,
+        instructions=system_prompt,
+        previous_response_id=query_request.conversation_id,
+        tools=(cast(Any, tools) if tools else cast(Any, None)),
+        stream=False,
+        store=True,
+    )


⚠️ Potential issue

🧩 Analysis chain

Consider gating Responses API persistence on transcripts setting.

responses.create(store=True) persists data server-side regardless of local transcripts_enabled. Gate or make configurable to avoid unintended storage.

- response = await client.responses.create( + store_resp = configuration.user_data_collection_configuration.transcripts_enabled + response = await client.responses.create( input=query_request.query, model=model_id, instructions=system_prompt, previous_response_id=query_request.conversation_id, tools=(cast(Any, tools) if tools else cast(Any, None)), stream=False, - store=True, + store=store_resp, )

Please confirm product intent: should server-side storage follow local transcript policy?

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo "---- rg version ----" rg --version 2>/dev/null || true echo "---- Search for responses.create / client.responses.create occurrences ----" rg -n --hidden --no-ignore "responses\.create\(" || true rg -n --hidden --no-ignore "client\.responses\.create\(" || true echo "---- Search for 'store=' usages ----" rg -n --hidden --no-ignore "store\s*=" -S || true echo "---- Search for transcripts / user_data_collection_configuration ----" rg -n --hidden --no-ignore "transcripts_enabled|transcripts|user_data_collection_configuration|user_data_collection" -S || true echo "---- Search for 'configuration.' references ----" rg -n --hidden --no-ignore "configuration\." -S || true echo "---- Show src/app/endpoints/query_v2.py (if present) around lines 1-260 ----" if [ -f src/app/endpoints/query_v2.py ]; then sed -n '1,260p' src/app/endpoints/query_v2.py || true else echo "src/app/endpoints/query_v2.py not found" fi

Length of output: 84095

Gate Responses API persistence on transcripts setting

responses.create(store=True) persists data server-side regardless of local transcripts_enabled. Calls found in src/app/endpoints/query_v2.py (≈lines 186–195) and src/app/endpoints/streaming_query_v2.py (≈lines 388–395) should pass store=configuration.user_data_collection_configuration.transcripts_enabled (or use utils.query.is_transcripts_enabled()) to avoid unintended remote storage.

- response = await client.responses.create( + store_resp = configuration.user_data_collection_configuration.transcripts_enabled + response = await client.responses.create( input=query_request.query, model=model_id, instructions=system_prompt, previous_response_id=query_request.conversation_id, tools=(cast(Any, tools) if tools else cast(Any, None)), stream=False, - store=True, + store=store_resp, )

Confirm product intent: should server-side storage follow local transcript policy?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Create OpenAI response using responses API

response = await client.responses.create(

input=query_request.query,

model=model_id,

instructions=system_prompt,

previous_response_id=query_request.conversation_id,

tools=(cast(Any, tools) if tools else cast(Any, None)),

stream=False,

store=True,

)

# Create OpenAI response using responses API

store_resp = configuration.user_data_collection_configuration.transcripts_enabled

response = await client.responses.create(

input=query_request.query,

model=model_id,

instructions=system_prompt,

previous_response_id=query_request.conversation_id,

tools=(cast(Any, tools) if tools else cast(Any, None)),

stream=False,

store=store_resp,

)

🤖 Prompt for AI Agents

In src/app/endpoints/query_v2.py around lines 186 to 195, the call to client.responses.create currently sets store=True which forces server-side persistence regardless of local transcript settings; change the store argument to respect the app configuration by using configuration.user_data_collection_configuration.transcripts_enabled (or utils.query.is_transcripts_enabled() if preferred) so the call becomes store=configuration.user_data_collection_configuration.transcripts_enabled, and ensure configuration is in scope or import the utils helper if using it.

coderabbitai · 2025-09-17T14:00:41Z

src/app/endpoints/streaming_query_v2.py

+    # log Llama Stack configuration
+    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+


⚠️ Potential issue

Don’t log entire config; call accessor and log safe fields.

Same concern as in utils.query.

Apply:

- logger.info("Llama stack config: %s", configuration.llama_stack_configuration) + cfg = configuration.llama_stack_configuration() + logger.info( + "Llama Stack version: %s; providers: %s", + getattr(cfg, "version", "n/a"), + getattr(cfg, "providers", "n/a"), + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# log Llama Stack configuration

logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

# log Llama Stack configuration

cfg = configuration.llama_stack_configuration()

logger.info(

"Llama Stack version: %s; providers: %s",

getattr(cfg, "version", "n/a"),

getattr(cfg, "providers", "n/a"),

)

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query_v2.py around lines 88 to 90, avoid logging the entire configuration object; instead call the accessor method that returns only safe-to-log fields and log those specific fields (e.g., model name, temperature, and any non-sensitive flags). Replace the direct logger.info of configuration.llama_stack_configuration with a call like configuration.get_safe_llama_stack_config() (or the existing accessor used in utils.query) and log the returned safe dict/values so no secrets or sensitive internal state are emitted.

coderabbitai · 2025-09-17T14:00:41Z

src/app/endpoints/streaming_query_v2.py

+        llama_stack_model_id, model_id, provider_id = select_model_and_provider_id(
+            await client.models.list(),
+            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+        )


⚠️ Potential issue

BUG: evaluate_model_hints ignores validated conversation context.

You pass user_conversation=None instead of the resolved user_conversation.

Apply:

- *evaluate_model_hints(user_conversation=None, query_request=query_request), + *evaluate_model_hints( + user_conversation=user_conversation, query_request=query_request + ),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

llama_stack_model_id, model_id, provider_id = select_model_and_provider_id(

await client.models.list(),

*evaluate_model_hints(user_conversation=None, query_request=query_request),

)

llama_stack_model_id, model_id, provider_id = select_model_and_provider_id(

await client.models.list(),

*evaluate_model_hints(

user_conversation=user_conversation, query_request=query_request

),

)

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query_v2.py around lines 116 to 119, evaluate_model_hints is being called with user_conversation=None which ignores the validated/resolved conversation context; replace the None with the actual resolved user_conversation variable (the one validated/constructed earlier in this function) so evaluate_model_hints receives the real conversation, ensuring model selection uses the correct context.

coderabbitai · 2025-09-17T14:00:41Z

src/app/endpoints/streaming_query_v2.py

+        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        if mcp_tools:
+            tools.extend(mcp_tools)


⚠️ Potential issue

BUG: call configuration.mcp_servers() accessor.

Attribute access will fail; also aligns with other config accessors.

Apply:

- mcp_tools = get_mcp_tools(configuration.mcp_servers, token) + mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mcp_tools = get_mcp_tools(configuration.mcp_servers, token)

if mcp_tools:

tools.extend(mcp_tools)

mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

if mcp_tools:

tools.extend(mcp_tools)

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query_v2.py around lines 379 to 381, the code incorrectly accesses configuration.mcp_servers as an attribute; change it to call the accessor configuration.mcp_servers() to match other config accessors and avoid an attribute access error — call get_mcp_tools(configuration.mcp_servers(), token) and keep the conditional and tools.extend logic the same.

coderabbitai · 2025-09-17T14:00:42Z

src/constants.py

 # Minimal and maximal supported Llama Stack version
 MINIMAL_SUPPORTED_LLAMA_STACK_VERSION = "0.2.17"
-MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION = "0.2.20"
+MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION = "0.2.21"


⚠️ Potential issue

🧩 Analysis chain

Confirm upstream compatibility for 0.2.21 before merge.

Bumping MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION impacts gating and client initialization. Please verify upstream supports 0.2.21 across your deployment matrix and adjust docs/tests if needed.

Run to sanity‑check version gates and any hardcoded expectations:

🏁 Script executed:

#!/bin/bash # Find version guards and hardcoded mentions rg -nP 'MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION|MINIMAL_SUPPORTED_LLAMA_STACK_VERSION|0\.2\.2[0-9]' -C2

Length of output: 9300

Confirm upstream compatibility for llama-stack 0.2.21 and align repository pins/docs/tests

MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION was bumped to "0.2.21" but the repo currently pins/mentions 0.2.20 — resolve before merge by either upgrading pinned deps/lock/docs/tests to 0.2.21 (and regenerating uv.lock) if upstream actually provides 0.2.21, or revert the constant to "0.2.20".

Affected locations (update or verify): pyproject.toml, uv.lock, examples/pyproject.llamastack.toml, README.md, docs/getting_started.md, docs/deployment_guide.md, docs/output.md, docs/openapi.md/openapi.json, src/models/responses.py, tests/e2e/features/info.feature, tests/unit/utils/test_llama_stack_version.py.

🤖 Prompt for AI Agents

In src/constants.py around line 5 the MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION was changed to "0.2.21" but the repo still pins/mentions "0.2.20"; either update all repository pins, locks, docs, tests and examples to 0.2.21 (pyproject.toml, uv.lock, examples/pyproject.llamastack.toml, README.md, docs/* including getting_started.md, deployment_guide.md, output.md, openapi.md/openapi.json, src/models/responses.py, tests/e2e/features/info.feature, tests/unit/utils/test_llama_stack_version.py) and regenerate uv.lock, or revert this constant back to "0.2.20" so pins/docs/tests remain consistent before merging.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/streaming_query.py (1)

350-354: Call configuration accessor for MCP servers.

Fix attribute vs method usage in both places.

-            for mcp_server in configuration.mcp_servers:
+            for mcp_server in configuration.mcp_servers():
                 mcp_headers[mcp_server.url] = {
                     "Authorization": f"Bearer {token}",
                 }
@@
-        toolgroups = (get_rag_toolgroups(vector_db_ids) or []) + [
-            mcp_server.name for mcp_server in configuration.mcp_servers
+        toolgroups = (get_rag_toolgroups(vector_db_ids) or []) + [
+            mcp_server.name for mcp_server in configuration.mcp_servers()
         ]

Also applies to: 367-368

🧹 Nitpick comments (15)

src/app/endpoints/conversations_v2.py (2)
27-27: Use module logger name.

Prefer module-scoped logger for consistency.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
55-57: Annotate auth dependency with AuthTuple.

Improves type clarity and matches project conventions.
-    auth: Any = Depends(auth_dependency),
+    auth: Annotated[AuthTuple, Depends(auth_dependency)],
Add missing import at top of file:
from authentication.interface import AuthTuple
Also applies to: 65-68, 90-93
src/app/endpoints/query.py (1)
47-47: Use module logger name.

Align with logging guideline.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/utils/query.py (1)
30-30: Use module logger name.

Consistent logging naming.
-logger = logging.getLogger("utils.query")
+logger = logging.getLogger(__name__)
src/app/endpoints/query_v2.py (1)
38-38: Use module logger name.

Consistent logging naming.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
tests/unit/app/endpoints/test_conversations.py (1)
24-24: Use shared MOCK_AUTH constant.

Tests should use the shared auth mock for consistency.
-MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")
+# Prefer: from tests.unit.conftest import MOCK_AUTH
src/app/endpoints/streaming_query_v2.py (1)
52-52: Use module logger name.

Consistent logging naming.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/app/endpoints/streaming_query.py (3)
141-142: Avoid logging full config; log safe fields and call accessor.

Prevents secrets leakage.
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )
256-256: Set SSE media type and headers.

Improves client compatibility and reduces buffering.
-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )
58-58: Use module logger name.

Consistent logging naming.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/utils/streaming_query.py (1)
21-21: Use module logger name.

Consistent logging naming.
-logger = logging.getLogger("utils.streaming_query")
+logger = logging.getLogger(__name__)
src/app/endpoints/conversations.py (1)
32-32: Use module-standard logger name

Prefer name per guidelines for consistent logger hierarchy.

Apply this diff:
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
src/utils/conversations.py (3)
239-244: Clarify error message for non-UUID SUIDs

check_suid accepts resp‑* IDs; message should not claim “not a valid UUID”.

Apply this diff:
-            detail={
-                "response": "Invalid conversation ID format",
-                "cause": f"Conversation ID {conversation_id} is not a valid UUID",
-            },
+            detail={
+                "response": "Invalid conversation ID format",
+                "cause": f"Conversation ID {conversation_id} is not a valid conversation ID format",
+            },
345-346: Avoid blocking in async path: run local DB delete in threadpool

delete_conversation performs DB I/O; run it off the event loop.

Apply this diff:
-from fastapi import HTTPException, Request, status
+from fastapi import HTTPException, Request, status
+from starlette.concurrency import run_in_threadpool
@@
-        delete_conversation(conversation_id=conversation_id)
+        await run_in_threadpool(delete_conversation, conversation_id=conversation_id)
Also applies to: 8-12

28-28: Use module-standard logger name

Align with guideline to use name.

Apply this diff:
-logger = logging.getLogger("utils.conversations")
+logger = logging.getLogger(__name__)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 61301f7 and d16d834.

📒 Files selected for processing (18)

src/app/endpoints/conversations.py (7 hunks)
src/app/endpoints/conversations_v2.py (1 hunks)
src/app/endpoints/query.py (5 hunks)
src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query.py (2 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (3 hunks)
src/constants.py (1 hunks)
src/utils/conversations.py (1 hunks)
src/utils/query.py (1 hunks)
src/utils/streaming_query.py (1 hunks)
src/utils/suid.py (1 hunks)
tests/unit/app/endpoints/test_conversations.py (14 hunks)
tests/unit/app/endpoints/test_query.py (7 hunks)
tests/unit/app/endpoints/test_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (5 hunks)
tests/unit/conftest.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (8)

tests/unit/app/endpoints/test_query_v2.py
tests/unit/app/test_routers.py
tests/unit/app/endpoints/test_query.py
src/app/routers.py
src/utils/suid.py
tests/unit/app/endpoints/test_streaming_query_v2.py
src/constants.py
tests/unit/conftest.py

🧰 Additional context used

📓 Path-based instructions (7)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/endpoints/conversations_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py
src/utils/query.py
src/utils/streaming_query.py
src/utils/conversations.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/conversations.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/endpoints/conversations_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/conversations.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/endpoints/conversations_v2.py
src/app/endpoints/streaming_query.py
tests/unit/app/endpoints/test_conversations.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py
src/utils/query.py
src/utils/streaming_query.py
src/utils/conversations.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/conversations.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/endpoints/conversations_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/conversations.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/conversations_v2.py
src/app/endpoints/streaming_query.py
src/app/endpoints/query_v2.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query_v2.py
src/app/endpoints/conversations.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/endpoints/test_conversations.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/endpoints/test_conversations.py

🧠 Learnings (1)

📚 Learning: 2025-09-18T16:46:33.353Z

Learnt from: CR
PR: lightspeed-core/lightspeed-stack#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Handle Llama Stack APIConnectionError when interacting with the Llama client

Applied to files:

tests/unit/app/endpoints/test_conversations.py

🧬 Code graph analysis (9)

src/app/endpoints/conversations_v2.py (5)

src/authentication/__init__.py (1)

get_auth_dependency (14-43)

src/authorization/middleware.py (1)

authorize (111-122)

src/models/config.py (2)

config (138-144)

Action (311-350)

src/models/responses.py (3)

ConversationDeleteResponse (443-476)

ConversationResponse (393-440)

ConversationsListResponse (540-597)

src/utils/conversations.py (4)

get_conversations_list_base (175-231)

delete_conversation_base (284-382)

validate_conversation_id (234-244)

validate_conversation_access (247-281)

src/app/endpoints/streaming_query.py (4)

src/app/endpoints/query.py (1)

get_rag_toolgroups (266-293)

src/utils/endpoints.py (5)

check_configuration_loaded (63-69)

get_agent (139-192)

get_system_prompt (72-112)

validate_model_provider_override (115-135)

validate_conversation_ownership (40-60)

src/utils/query.py (7)

is_input_shield (242-255)

is_output_shield (229-239)

is_transcripts_enabled (55-61)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

persist_user_conversation_details (64-91)

evaluate_model_hints (94-133)

src/utils/streaming_query.py (3)

stream_start_event (40-61)

stream_end_event (64-100)

stream_build_event (104-146)

src/app/endpoints/query_v2.py (9)

src/authentication/__init__.py (1)

get_auth_dependency (14-43)

src/authorization/middleware.py (1)

authorize (111-122)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/models/requests.py (1)

QueryRequest (72-222)

src/models/responses.py (1)

QueryResponse (48-79)

src/utils/endpoints.py (1)

get_system_prompt (72-112)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/query.py (5)

evaluate_model_hints (94-133)

process_transcript_and_persist_conversation (357-388)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

validate_query_request (290-340)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/query.py (3)

src/client.py (1)

AsyncLlamaStackClientHolder (18-55)

src/utils/endpoints.py (1)

get_agent (139-192)

src/utils/query.py (8)

evaluate_model_hints (94-133)

select_model_and_provider_id (136-213)

is_input_shield (242-255)

is_output_shield (229-239)

validate_attachments_metadata (258-287)

validate_query_request (290-340)

handle_api_connection_error (343-354)

process_transcript_and_persist_conversation (357-388)

src/utils/query.py (9)

src/configuration.py (4)

configuration (65-69)

user_data_collection_configuration (86-90)

inference (126-130)

llama_stack_configuration (79-83)

src/app/database.py (1)

get_session (34-40)

src/models/config.py (2)

config (138-144)

Action (311-350)

src/models/database/conversations.py (1)

UserConversation (11-36)

src/models/requests.py (2)

QueryRequest (72-222)

Attachment (15-69)

src/models/responses.py (2)

UnauthorizedResponse (361-375)

ForbiddenResponse (378-390)

src/utils/endpoints.py (3)

check_configuration_loaded (63-69)

validate_conversation_ownership (40-60)

validate_model_provider_override (115-135)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (1)

TurnSummary (59-78)

src/utils/streaming_query.py (3)

src/utils/types.py (2)

TurnSummary (59-78)

append_tool_calls_from_llama (65-78)

src/utils/endpoints.py (1)

get_system_prompt (72-112)

src/metrics/utils.py (1)

update_llm_token_count_from_turn (60-77)

src/utils/conversations.py (7)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/app/database.py (1)

get_session (34-40)

src/models/config.py (2)

config (138-144)

Action (311-350)

src/models/database/conversations.py (1)

UserConversation (11-36)

src/models/responses.py (4)

ConversationDeleteResponse (443-476)

ConversationsListResponse (540-597)

ConversationDetails (479-537)

UnauthorizedResponse (361-375)

src/utils/endpoints.py (3)

check_configuration_loaded (63-69)

delete_conversation (23-37)

validate_conversation_ownership (40-60)

src/utils/suid.py (1)

check_suid (15-44)

src/app/endpoints/streaming_query_v2.py (9)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/configuration.py (3)

configuration (65-69)

llama_stack_configuration (79-83)

mcp_servers (93-97)

src/models/requests.py (1)

QueryRequest (72-222)

src/utils/endpoints.py (4)

check_configuration_loaded (63-69)

get_system_prompt (72-112)

validate_model_provider_override (115-135)

validate_conversation_ownership (40-60)

src/utils/query.py (5)

evaluate_model_hints (94-133)

is_transcripts_enabled (55-61)

persist_user_conversation_details (64-91)

select_model_and_provider_id (136-213)

validate_attachments_metadata (258-287)

src/utils/streaming_query.py (3)

format_stream_data (26-37)

stream_start_event (40-61)

stream_end_event (64-100)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (2)

TurnSummary (59-78)

ToolCallSummary (43-56)

src/app/endpoints/query_v2.py (2)

get_rag_tools (298-309)

get_mcp_tools (312-328)

src/app/endpoints/conversations.py (6)

src/configuration.py (1)

configuration (65-69)

src/models/config.py (2)

config (138-144)

Action (311-350)

src/models/responses.py (3)

ConversationResponse (393-440)

ConversationDeleteResponse (443-476)

ConversationsListResponse (540-597)

src/utils/endpoints.py (1)

check_configuration_loaded (63-69)

src/utils/conversations.py (5)

simplify_session_data (133-172)

get_conversations_list_base (175-231)

validate_conversation_id (234-244)

validate_conversation_access (247-281)

delete_conversation_base (284-382)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: e2e_tests

🔇 Additional comments (15)

src/app/endpoints/conversations_v2.py (3)

32-43: Resolved: helpers are correctly async now (previous sync suggestion is obsolete).

get_response_sessions aligns with async delete_conversation_base and is awaited at call site.

45-49: Resolved: deletion helper is correctly async.

Matches the awaited usage and async base signature.

95-101: Resolved: using await on async base is correct.

delete_conversation_base is async in utils; keeping await avoids TypeError.

src/utils/query.py (3)

307-308: Avoid logging full config; log safe fields and call accessor.

Prevents secrets leakage and fixes accessor usage.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

136-214: Fix model/provider selection: wrong config access and wrong early return.

Call configuration.inference() accessor.
Return (llama_stack_model_id, model_id, provider_id) when defaulting to first LLM.

 def select_model_and_provider_id(
     models: ModelListResponse, model_id: str | None, provider_id: str | None
 ) -> tuple[str, str, str]:
@@
-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request, checking configuration"
         )
-        model_id = configuration.inference.default_model  # type: ignore[reportAttributeAccessIssue]
-        provider_id = (
-            configuration.inference.default_provider  # type: ignore[reportAttributeAccessIssue]
-        )
+        model_id = configuration.inference().default_model
+        provider_id = configuration.inference().default_provider
@@
-    if not model_id or not provider_id:
+    if not model_id or not provider_id:
         logger.debug(
             "No model ID or provider ID specified in request or configuration, "
             "using the first available LLM"
         )
         try:
             model = next(
                 m
                 for m in models
                 if m.model_type == "llm"  # pyright: ignore[reportAttributeAccessIssue]
             )
-            model_id = model.identifier
-            provider_id = model.provider_id
-            logger.info("Selected model: %s", model)
-            return model_id, model_id, provider_id
+            llama_stack_model_id = model.identifier
+            provider_id = model.provider_id
+            # derive plain model id if identifier is "provider/model"
+            model_id = (
+                llama_stack_model_id.split("/", 1)[1]
+                if "/" in llama_stack_model_id
+                else llama_stack_model_id
+            )
+            logger.info("Selected model: %s", llama_stack_model_id)
+            return llama_stack_model_id, model_id, provider_id
@@
-    llama_stack_model_id = f"{provider_id}/{model_id}"
+    llama_stack_model_id = f"{provider_id}/{model_id}"
@@
-    return llama_stack_model_id, model_id, provider_id
+    return llama_stack_model_id, model_id, provider_id

55-61: Bug: configuration accessor not called (runtime AttributeError).

user_data_collection_configuration is a method.

-def is_transcripts_enabled() -> bool:
+def is_transcripts_enabled() -> bool:
@@
-    return configuration.user_data_collection_configuration.transcripts_enabled
+    return configuration.user_data_collection_configuration().transcripts_enabled

src/app/endpoints/streaming_query_v2.py (4)

88-90: Avoid logging full config; log safe fields and call accessor.

Prevents secrets leakage.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    cfg = configuration.llama_stack_configuration()
+    logger.info(
+        "Llama Stack version: %s; providers: %s",
+        getattr(cfg, "version", "n/a"),
+        getattr(cfg, "providers", "n/a"),
+    )

306-309: Set proper SSE media type and headers.

Ensures clients process events without buffering.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

116-119: Pass the resolved conversation to evaluate_model_hints.

Currently ignores validated conversation context.

-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),

379-381: Call configuration accessor for MCP servers.

Fix attribute vs method usage.

-        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

src/utils/streaming_query.py (1)

123-126: Short-circuit on error chunks to avoid AttributeError.

Don’t access chunk.event after handling an error chunk.

-    if hasattr(chunk, "error"):
-        yield from _handle_error_event(chunk, chunk_id)
+    if hasattr(chunk, "error"):
+        yield from _handle_error_event(chunk, chunk_id)
+        return

src/app/endpoints/query_v2.py (1)

194-195: Gate server-side persistence to transcripts setting.

Avoid unintended storage when transcripts are disabled.
-        stream=False,
-        store=True,
+        stream=False,
+        store=configuration.user_data_collection_configuration().transcripts_enabled,

src/app/endpoints/conversations.py (3)

103-105: Don’t call .get on possibly typed session objects; extract session_id safely

Agent SDK may return typed objects; .get will raise. Handle dict/attr and fail fast if missing.

Apply this diff:

-        session_id = str(agent_sessions[0].get("session_id"))
+        first = agent_sessions[0]
+        session_id = (
+            first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+        )
+        if not session_id:
+            raise HTTPException(
+                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+                detail={"response": "session_id missing on session"},
+            )
+        session_id = str(session_id)

44-44: Run sync base in a threadpool to avoid blocking the event loop

get_conversations_list_base does DB I/O and is sync; calling it directly from an async endpoint will block.

Apply this diff:

-from fastapi import APIRouter, HTTPException, Request, status, Depends
+from fastapi import APIRouter, HTTPException, Request, status, Depends
+from starlette.concurrency import run_in_threadpool
@@
-    return get_conversations_list_base(request, auth)
+    return await run_in_threadpool(get_conversations_list_base, request, auth)

Also applies to: 8-8

152-156: Broaden helper typings and fix session_id extraction for delete path

Avoid assuming dicts; accept list[Any] and handle dict/attr for session_id.

Apply this diff:

-async def get_agent_sessions(client: Any, conversation_id: str) -> list[dict[str, Any]]:
+async def get_agent_sessions(client: Any, conversation_id: str) -> list[Any]:
@@
-async def delete_agent_sessions(
-    client: Any, conversation_id: str, sessions: list[dict[str, Any]]
-) -> None:
+async def delete_agent_sessions(
+    client: Any, conversation_id: str, sessions: list[Any]
+) -> None:
@@
-    session_id = str(sessions[0].get("session_id"))
+    first = sessions[0]
+    session_id = (
+        first.get("session_id") if isinstance(first, dict) else getattr(first, "session_id", None)
+    )
+    if not session_id:
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail={"response": "session_id missing on session"},
+        )
+    session_id = str(session_id)

Also applies to: 158-164

coderabbitai · 2025-09-23T07:39:10Z

src/app/endpoints/query_v2.py

+        mcp_tools = get_mcp_tools(configuration.mcp_servers, token)
+        if mcp_tools:
+            tools.extend(mcp_tools)


⚠️ Potential issue

Call configuration accessor for MCP servers.

mcp_servers is a method.

- mcp_tools = get_mcp_tools(configuration.mcp_servers, token) + mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

mcp_tools = get_mcp_tools(configuration.mcp_servers, token)

if mcp_tools:

tools.extend(mcp_tools)

mcp_tools = get_mcp_tools(configuration.mcp_servers(), token)

if mcp_tools:

tools.extend(mcp_tools)

🤖 Prompt for AI Agents

In src/app/endpoints/query_v2.py around lines 177 to 179, the code treats configuration.mcp_servers as an attribute but it is a method; call configuration.mcp_servers() and pass its return value to get_mcp_tools (i.e., replace configuration.mcp_servers with configuration.mcp_servers()), ensuring any required arguments or async/await usage are respected if the accessor is async.

coderabbitai · 2025-09-23T07:39:10Z

src/app/endpoints/streaming_query_v2.py

+    response = await client.responses.create(
+        input=query_request.query,
+        model=model_id,
+        instructions=system_prompt,
+        previous_response_id=query_request.conversation_id,
+        tools=(cast(Any, tools) if tools else cast(Any, None)),
+        stream=True,
+        store=True,
+    )


⚠️ Potential issue

Gate server-side persistence to transcripts setting.

Avoid persisting when transcripts are disabled.

- response = await client.responses.create( + response = await client.responses.create( input=query_request.query, model=model_id, instructions=system_prompt, previous_response_id=query_request.conversation_id, tools=(cast(Any, tools) if tools else cast(Any, None)), stream=True, - store=True, + store=configuration.user_data_collection_configuration().transcripts_enabled, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

response = await client.responses.create(

input=query_request.query,

model=model_id,

instructions=system_prompt,

previous_response_id=query_request.conversation_id,

tools=(cast(Any, tools) if tools else cast(Any, None)),

stream=True,

store=True,

)

response = await client.responses.create(

input=query_request.query,

model=model_id,

instructions=system_prompt,

previous_response_id=query_request.conversation_id,

tools=(cast(Any, tools) if tools else cast(Any, None)),

stream=True,

store=configuration.user_data_collection_configuration().transcripts_enabled,

)

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query_v2.py around lines 388 to 396, the call to client.responses.create always sets store=True which persists responses even when server-side transcripts are disabled; change this to gate persistence behind the transcripts/analytics setting (e.g. use the app config or request flag that indicates transcripts are enabled) and pass store=(True if transcripts_enabled else False) (or omit the store parameter when disabled) so responses are only persisted when the transcripts feature is enabled.

coderabbitai · 2025-09-23T07:39:10Z

src/utils/conversations.py

+    200: {
+        "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
+        "success": True,
+        "message": "Conversation deleted successfully",
+    },
+    400: {


⚠️ Potential issue

OpenAPI schema mismatch: use response field (not message)

ConversationDeleteResponse uses response; docs example shows message.

Apply this diff:

conversation_delete_responses: dict[int | str, dict[str, Any]] = { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", "success": True, - "message": "Conversation deleted successfully", + "response": "Conversation deleted successfully", },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

200: {

"conversation_id": "123e4567-e89b-12d3-a456-426614174000",

"success": True,

"message": "Conversation deleted successfully",

},

400: {

200: {

"conversation_id": "123e4567-e89b-12d3-a456-426614174000",

"success": True,

"response": "Conversation deleted successfully",

},

400: {

🤖 Prompt for AI Agents

In src/utils/conversations.py around lines 68 to 73, the OpenAPI example for the 200 response uses the key "message" but the ConversationDeleteResponse schema expects "response"; update the example to replace the "message" field with "response" (e.g., "response": "Conversation deleted successfully") so the docs match the schema.

Note: metrics are not being reported for now

It is using conversations as conversation v3 endpoint

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (7)

src/utils/suid.py (2)
21-25: Docstring incorrect for Responses IDs.

States “follows a valid UUID string” but we should treat resp-* as opaque; update wording accordingly.

Apply:
-    Returns True if the string is a valid UUID or if it starts with resp-/resp_
-    and it follows a valid UUID string, False otherwise.
+    Returns True if the string is a valid UUID, or if it starts with resp-/resp_
+    and contains a non-empty opaque token, False otherwise.
26-33: Parameter type mismatch in docstring.

Docstring says str | bytes, function accepts str. Align docstring (or add bytes support by decoding).

Apply either:

Update docstring to str only, or

Accept bytes:
-def check_suid(suid: str) -> bool:
+def check_suid(suid: str | bytes) -> bool:
@@
-    if not isinstance(suid, str) or not suid:
+    if isinstance(suid, (bytes, bytearray)):
+        try:
+            suid = suid.decode("utf-8")
+        except Exception:
+            return False
+    if not isinstance(suid, str) or not suid:
         return False
src/app/routers.py (1)

32-41: Section label/order mismatch.

conversations_v2 is mounted under /v2 but appears in the “V1 endpoints” block. Move it under the V2 section or fix the comment to avoid confusion.
tests/unit/app/test_routers.py (2)
65-80: Reduce brittleness: avoid asserting exact router count.

Exact counts break when routes evolve. Prefer presence checks against an expected set.

Example:
-    assert len(app.routers) == 15
+    # keep flexible: assert presence instead of exact count
83-104: Add coverage for /v3 conversations mount.

conversations.router is also included under /v3; extend the test to validate that additional prefix to prevent regressions.

You can add a helper that returns all prefixes for a router and assert "/v1" in prefixes and "/v3" in prefixes.
src/app/endpoints/streaming_query_v2.py (2)
52-53: Use module logger name for consistency.

Prefer getLogger(name) per guidelines.
-logger = logging.getLogger("app.endpoints.handlers")
+logger = logging.getLogger(__name__)
365-387: Optional: propagate per-server MCP headers.

You accept mcp_headers but don’t use them when building tools; consider merging headers for each server.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d16d834 and b9ae829.

📒 Files selected for processing (8)

src/app/endpoints/query_v2.py (1 hunks)
src/app/endpoints/streaming_query_v2.py (1 hunks)
src/app/routers.py (3 hunks)
src/constants.py (1 hunks)
src/utils/suid.py (2 hunks)
tests/unit/app/endpoints/test_query_v2.py (1 hunks)
tests/unit/app/endpoints/test_streaming_query_v2.py (1 hunks)
tests/unit/app/test_routers.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/app/endpoints/test_streaming_query_v2.py
src/app/endpoints/query_v2.py
tests/unit/app/endpoints/test_query_v2.py

🧰 Additional context used

📓 Path-based instructions (8)

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

src/app/routers.py
src/utils/suid.py
src/app/endpoints/streaming_query_v2.py
src/constants.py

src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

src/app/routers.py
src/app/endpoints/streaming_query_v2.py

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

src/app/routers.py
src/utils/suid.py
src/app/endpoints/streaming_query_v2.py
src/constants.py
tests/unit/app/test_routers.py

src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

src/app/routers.py
src/app/endpoints/streaming_query_v2.py

src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

src/app/endpoints/streaming_query_v2.py

src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

src/constants.py

tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

tests/unit/app/test_routers.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

tests/unit/app/test_routers.py

🧬 Code graph analysis (2)

src/app/routers.py (1)

tests/unit/app/test_routers.py (1)

include_router (34-49)

src/app/endpoints/streaming_query_v2.py (8)

src/app/endpoints/query.py (5)

evaluate_model_hints (130-169)

is_transcripts_enabled (91-97)

persist_user_conversation_details (100-127)

select_model_and_provider_id (352-430)

validate_attachments_metadata (688-717)

src/utils/endpoints.py (4)

validate_conversation_ownership (41-61)

check_configuration_loaded (64-70)

get_system_prompt (73-113)

validate_model_provider_override (116-136)

src/app/endpoints/query_v2.py (2)

get_rag_tools (313-324)

get_mcp_tools (327-343)

src/app/endpoints/streaming_query.py (3)

format_stream_data (105-116)

stream_start_event (119-140)

stream_end_event (143-179)

src/client.py (2)

AsyncLlamaStackClientHolder (18-55)

get_client (49-55)

src/utils/mcp_headers.py (1)

mcp_headers_dependency (15-26)

src/utils/transcripts.py (1)

store_transcript (33-86)

src/utils/types.py (2)

TurnSummary (89-163)

ToolCallSummary (73-86)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: e2e_tests

🔇 Additional comments (8)

src/constants.py (1)
5-5: Align 0.2.22 bump across pins/docs/tests (or revert).

Make sure upstream 0.2.22 is available and all repo references match; otherwise gate checks and tests will drift. This was flagged earlier for 0.2.21.

Run:
#!/bin/bash
# Find version constants and hardcoded mentions
rg -nP -C2 'MINIMAL_SUPPORTED_LLAMA_STACK_VERSION|MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION|0\.2\.(1[7-9]|2[0-9])'

# Check common pins/mentions
fd -a 'pyproject.toml|uv.lock|README.md|**/*.md|**/*.toml|**/*.json' \
  | xargs rg -nP 'llama[-_ ]?stack|0\.2\.(1[7-9]|2[0-9])' -C2
src/app/endpoints/streaming_query_v2.py (5)
379-381: Remove incorrect accessor change
mcp_servers is defined as a property, so configuration.mcp_servers (not configuration.mcp_servers()) is correct and needs no change.

Likely an incorrect or invalid review comment.

116-119: BUG: model selection ignores resolved conversation context.

Pass the validated user_conversation instead of None.
-            *evaluate_model_hints(user_conversation=None, query_request=query_request),
+            *evaluate_model_hints(
+                user_conversation=user_conversation, query_request=query_request
+            ),
388-396: Gate server-side persistence to transcripts setting.

Avoid persisting when transcripts are disabled.
-        store=True,
+        store=is_transcripts_enabled(),
308-309: SSE requires proper media type and no-buffer headers.

Without text/event-stream, many clients won’t stream tokens.
-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )
88-90: Avoid logging full configuration — risk of leaking secrets.

Log only safe fields (e.g., version). Don’t serialize the whole object.
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    try:
+        cfg = configuration.llama_stack_configuration()  # if accessor; else treat as attr
+    except TypeError:
+        cfg = configuration.llama_stack_configuration  # fallback
+    logger.debug("Llama Stack version: %s", getattr(cfg, "version", "n/a"))
src/utils/suid.py (1)
37-49: Critical: Don’t coerce resp- IDs into UUIDs (remove zero‑padding/UUID parse).*

Response IDs are opaque; padding/truncation risks accepting malformed IDs and rejecting valid ones.

Apply:
-    # Handle Responses API IDs
-    if suid.startswith("resp-") or suid.startswith("resp_"):
-        token = suid[5:]
-        if not token:
-            return False
-        # If truncated (e.g., shell cut reduced length), pad to canonical UUID length
-        if len(token) < 36:
-            token = token + ("0" * (36 - len(token)))
-        try:
-            uuid.UUID(token)
-            return True
-        except (ValueError, TypeError):
-            return False
+    # Handle Responses API IDs – treat as opaque, minimally validated
+    if suid.startswith(("resp-", "resp_")):
+        token = suid[5:]
+        # non-empty token with safe charset
+        return bool(token) and all(ch.isalnum() or ch in "-_." for ch in token)
src/app/routers.py (1)

45-46: src/app/routers.py mounts conversations.router under both /v1 (line 39) and /v3 (line 45) — confirm this reuse is intentional and adjust operationIds or tags to prevent duplicate OpenAPI operations.

omertuc · 2025-09-29T10:30:34Z

Let's hold it until we have a conversations API

luis5tb · 2025-10-06T05:38:50Z

Let's hold it until we have a conversations API

Conversations (initial support) PR was merged! \o/

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

luis5tb force-pushed the responses_v2 branch from c3d8748 to 692ede7 Compare September 16, 2025 06:13

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

luis5tb force-pushed the responses_v2 branch from 692ede7 to 61301f7 Compare September 17, 2025 13:46

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

luis5tb force-pushed the responses_v2 branch from 202af3f to d16d834 Compare September 23, 2025 07:30

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

luis5tb force-pushed the responses_v2 branch from d16d834 to 387f7b0 Compare September 29, 2025 08:10

luis5tb added 6 commits September 29, 2025 10:11

Bump maximum version of llamastack to 2.22

41ff7fd

Base implementation of query_v2

3e642db

Note: metrics are not being reported for now

Adding unit test coverage for query_v2 (cursor generated)

285d5f9

First version of streaming_query for Responses API (v2)

9509ae5

Adding unit test coverage for streaming_query_v2 (cursor generated)

9de1a02

Add conversations V3 to allow testing of query/streaming_query V2

b9ae829

It is using conversations as conversation v3 endpoint

luis5tb force-pushed the responses_v2 branch from 387f7b0 to b9ae829 Compare September 29, 2025 08:16

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

luis5tb changed the title ~~[WIP] Responses v2~~ Responses v2 support Sep 29, 2025

		logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

	logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
	cfg = configuration.llama_stack_configuration()
	logger.info("Llama Stack version: %s; providers: %s", getattr(cfg, "version", "n/a"), getattr(cfg, "providers", "n/a"))

		# log Llama Stack configuration
		logger.info("Llama stack config: %s", configuration.llama_stack_configuration)

Responses v2 support #540

Are you sure you want to change the base?

Responses v2 support #540

Conversation

luis5tb commented Sep 15, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

luis5tb commented Sep 15, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 15, 2025 •

edited

Loading