Skip to content

Conversation

gallettilance
Copy link
Contributor

@gallettilance gallettilance commented Sep 9, 2025

Goal: with minimal filtering, OLS can use the LCORE streaming query endpoint.

Today, OLS cannot use the original lightspeed-stack streaming query endpoint with just filtering. The following changes are necessary for OLS compatibility.

Tool Event Ambiguity Problem

Original lightspeed-stack format:

// Tool call initiation
{"event": "tool_call", "data": {"role": "tool_execution", "token": {"tool_name": "search", "arguments": {...}}}}

// Tool call result  
{"event": "tool_call", "data": {"role": "tool_execution", "token": {"tool_name": "search", "response": "..."}}}

OLS expected format:

// Tool call initiation
{"event": "tool_call", "data": {"tool_name": "search", "arguments": {...}}}

// Tool call result
{"event": "tool_result", "data": {"tool_name": "search", "response": "..."}}

Both events have identical event type and role field. A filter cannot distinguish between a tool call and its result without semantic understanding of the tool execution lifecycle. It's possible (by just looking for double tool_call events) but annoying.

Data Structure Mismatches

Original format:

{
  "event": "end",
  "data": {
    "referenced_documents": [...],
    "truncated": null,  // Should be boolean
    "input_tokens": 0,
    "output_tokens": 0
  },
}

OLS expected:

{
  "event": "end", 
  "data": {
    "referenced_documents": [...],
    "truncated": false,  // Boolean
    "input_tokens": 100,
    "output_tokens": 50
  },
}

Structural differences require data transformation, not just filtering. Again, nothing that can't actually be solved with some additional code but it's annoying for OLS so let's just fix it.

Media Type Support

OLS supports both application/json and text/plain so I added text/plain as a supported mediatype.

Error Handling

Original format:

{"event": "error", "data": {"id": 0, "token": "Generic error message"}}

OLS expected:

// PromptTooLongError
{"event": "error", "data": {"status_code": 413, "response": "Prompt is too long", "cause": "..."}}

// Generic LLM error  
{"event": "error", "data": {"response": "Internal server error", "cause": "..."}}

Let's just provide better errors for OLS.

Role Field

The role field is not required by OLS or Ansible so I removed it

Summary by CodeRabbit

  • New Features

    • Streaming now supports selectable media types (application/json, text/plain) and emits OLS-compatible streaming events (token, tool_call, tool_result) with conversation ID propagation.
  • Bug Fixes

    • Media-type-aware start/end and error outputs, improved prompt-too-long and generic LLM error formatting.
    • Request validation now enforces supported media_type values.
  • Tests

    • Expanded coverage for media-type formats, streaming events, error cases, and public streaming helpers.

Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Walkthrough

Adds media-type-aware streaming (application/json | text/plain), OLS-style event types (token, tool_call, tool_result), a central stream_event formatter, media-type-aware error/end handling, and propagates media_type and conversation_id across streaming handlers and responses.

Changes

Cohort / File(s) Summary of Changes
Streaming endpoint (media-type + OLS events)
src/app/endpoints/streaming_query.py
Added LLM_TOKEN_EVENT, LLM_TOOL_CALL_EVENT, LLM_TOOL_RESULT_EVENT; added stream_event, generic_llm_error, prompt_too_long_error; made event builders/handlers media_type-aware; updated many _handle_* signatures and stream_build_event/stream_end_event; unified formatting via stream_event; streaming response now carries explicit media_type and propagates conversation_id.
Media type constants
src/constants.py
Added MEDIA_TYPE_JSON = "application/json" and MEDIA_TYPE_TEXT = "text/plain".
Request model validation
src/models/requests.py
QueryRequest.media_type examples/description updated to use constants; validate_media_type now raises ValueError for unsupported media types (enforces allowed values).
Endpoint tests (OLS + media-type)
tests/unit/app/endpoints/test_streaming_query.py
Tests updated to import new helpers/constants; added coverage for stream_event/stream_end_event/error formatting in both JSON and text formats; adjusted expectations for OLS event shapes, conversation_id behavior, and MCP header propagation.
Request model tests
tests/unit/models/requests/test_query_request.py
Test expectation changed to reflect stricter media_type validation (no warning path for unsupported values).

Sequence Diagram(s)

sequenceDiagram
  participant C as Client
  participant API as /streaming_query
  participant G as Generator
  participant H as stream_event / handlers

  C->>API: POST /streaming_query (media_type)
  API->>G: start generator(media_type, conversation_id)
  Note right of API: return StreamingResponse(media_type)

  G->>H: stream_event({start meta}, "start", media_type)
  H-->>G: formatted start event (JSON / text)
  G-->>C: emit start

  loop tokens / tool calls / results
    G->>H: stream_event(data, event_type, media_type)
    H-->>G: formatted event (JSON / text)
    G-->>C: emit event
  end

  alt error
    G->>H: generic_llm_error(err, media_type)
    H-->>G: formatted error event
    G-->>C: emit error + end
  else success
    G->>H: stream_end_event(metadata, media_type)
    H-->>G: formatted end event
    G-->>C: emit end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • are-ces
  • jrobertboos
  • manstis
  • tisnik

Poem

A rabbit hums beside the stream,
Tokens hop and JSON gleam,
Tool calls tap, results arrive,
Text or JSON — events alive,
Hop, emit, and end — the stream's alive! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the primary objective of the changeset—ensuring OLS streaming query compatibility—without extraneous details, making it clear and specific to reviewers scanning the pull request history.
Docstring Coverage ✅ Passed Docstring coverage is 98.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/endpoints/streaming_query.py (2)

1-1: CI hygiene: run Black, remove trailing whitespace, and address broad exceptions

  • Run black . (e.g. pip install black then black .) to fix formatting.
  • Remove trailing whitespace in src/app/endpoints/streaming_query.py at lines 256, 677, and 680.
  • For each except Exception (lines 520, 705, 714), either narrow the caught exception to a more specific type or add # pylint: disable=broad-except with a brief justification for why broad-catching is unavoidable.

389-433: Swap out direct format_stream_data calls in streaming handlers
The _handle_shield_event, _handle_inference_event, and _handle_tool_execution_event functions still emit JSON frames via format_stream_data; refactor each to use stream_event so that text/plain yields raw token strings and all other media types yield JSON frames.

🧹 Nitpick comments (5)
src/models/responses.py (1)

553-576: Tighten typing for StreamedChunk.type and future-proof the shape

Constrain type to known literals and keep room for extension. This improves validation and OpenAPI docs.

-class StreamedChunk(BaseModel):
+class StreamedChunk(BaseModel):
@@
-    type: str = Field(
+    # Known chunk types; extend as protocol evolves.
+    type: Literal["text", "tool_call", "tool_result", "end"] = Field(
         description="The type of chunk",
-        examples=["text", "tool_call", "tool_result", "end"],
+        examples=["text", "tool_call", "tool_result", "end"],
     )

Optional: if you plan different payloads per type, consider a Tagged Union (discriminated union) with type as the discriminator to enforce event-specific schemas.

src/app/endpoints/streaming_query.py (4)

79-101: Start event should be media-type aware

For text/plain, emitting a JSON “start” frame is unexpected. Make stream_start_event media-type aware and no-op for text.

-def stream_start_event(conversation_id: str) -> str:
+def stream_start_event(conversation_id: str, media_type: str) -> str:
@@
-    return format_stream_data(
+    if media_type == "text/plain":
+        return ""
+    return format_stream_data(
         {
             "event": "start",
             "data": {"conversation_id": conversation_id},
         }
     )

And call:

-            yield stream_start_event(conversation_id)
+            yield stream_start_event(conversation_id, media_type)

244-271: Align error payload with media type and avoid duplicate logging

You already log at error+exception; one is enough. Also, for JSON, consider nesting under detail to mirror ErrorResponse.

-    logger.error("Error while obtaining answer for user question")
-    logger.exception(error)
+    logger.exception("Error while obtaining answer for user question")
@@
-    return format_stream_data(
-        {
-            "event": "error",
-            "data": {
-                "response": response,
-                "cause": cause,
-            },
-        }
-    )
+    return format_stream_data({
+        "event": "error",
+        "data": {"detail": {"response": response, "cause": cause}},
+    })

Note: only apply if downstream expects detail. Otherwise keep current shape.


440-553: Tool execution: parse metadata safely and emit via stream_event

  • Use json.loads first, fallback to ast.literal_eval only if needed.
  • Emit using stream_event for media-type correctness.
  • Minor: use non-greedy metadata regex to avoid over-capture.
-                        for match in METADATA_PATTERN.findall(text_content_item.text):
+                        for match in METADATA_PATTERN.findall(text_content_item.text):
                             try:
-                                meta = ast.literal_eval(match)
+                                try:
+                                    meta = json.loads(match)
+                                except Exception:
+                                    meta = ast.literal_eval(match)
                                 if "document_id" in meta:
                                     metadata_map[meta["document_id"]] = meta
                             except Exception:  # pylint: disable=broad-except
                                 logger.debug("An exception was thrown in processing %s", match)
@@
-                yield format_stream_data({
-                    "event": LLM_TOOL_RESULT_EVENT,
-                    "data": {"id": chunk_id, "token": {"tool_name": r.tool_name, "summary": summary}},
-                })
+                yield stream_event({"id": chunk_id, "token": {"tool_name": r.tool_name, "summary": summary}}, LLM_TOOL_RESULT_EVENT, media_type)

And consider tightening the regex (declare near definition):

-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n")
+METADATA_PATTERN = re.compile(r"\nMetadata:\s*(\{.*?\})\s*\n", re.DOTALL)

349-371: Use the event constant and drop role field in shield events

To align with OLS naming and recent test removals, switch "event": "token" to LLM_TOKEN_EVENT and omit role from payloads.

-                    "event": "token",
+                    "event": LLM_TOKEN_EVENT,
                     "data": {
                         "id": chunk_id,
-                        "role": chunk.event.payload.step_type,
                         "token": "No Violation",
                     },

Apply the same removal in the violation branch.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 528e0b9 and cfa516a.

📒 Files selected for processing (3)
  • src/app/endpoints/streaming_query.py (15 hunks)
  • src/models/responses.py (1 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/unit/app/endpoints/test_streaming_query.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (4)
src/models/responses.py (1)
  • StreamedChunk (553-575)
src/utils/endpoints.py (1)
  • get_system_prompt (71-102)
src/metrics/utils.py (1)
  • update_llm_token_count_from_turn (60-77)
src/utils/types.py (1)
  • append_tool_calls_from_llama (65-78)
🪛 GitHub Actions: Black
src/app/endpoints/streaming_query.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted: src/app/endpoints/streaming_query.py. Command 'uv tool run black --check .' exited with code 1. Run 'black --write .' to fix formatting.

🪛 GitHub Actions: Ruff
src/app/endpoints/streaming_query.py

[error] 32-32: Lint step 'uv tool run ruff check . --per-file-ignores=tests/:S101 --per-file-ignores=scripts/:S101' failed. F401: 'models.responses.StreamedChunk' imported but unused in src/app/endpoints/streaming_query.py:32. 1 fixable with 'ruff --fix'.

🪛 GitHub Actions: Pyright
src/app/endpoints/streaming_query.py

[error] 702-702: Step: uv run pyright src - Pyright error: Cannot access attribute 'usage' for class 'Turn'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 703-703: Step: uv run pyright src - Pyright error: Cannot access attribute 'usage' for class 'Turn'. Attribute 'usage' is unknown (reportAttributeAccessIssue).


[error] 704-704: Step: uv run pyright src - Pyright error: Cannot access attribute 'usage' for class 'Turn'. Attribute 'usage' is unknown (reportAttributeAccessIssue).

🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py

[warning] 256-256: Trailing whitespace (trailing-whitespace).


[warning] 677-677: Trailing whitespace (trailing-whitespace).


[warning] 680-680: Trailing whitespace (trailing-whitespace).


[warning] 703-703: Line too long (105/100) (line-too-long).


[warning] 704-704: Line too long (110/100) (line-too-long).


[warning] 714-714: Catching too general exception (broad-exception-caught).


[warning] 584-584: Too many statements (57/50) (too-many-statements).


[warning] 32-32: Unused import: StreamedChunk from models.responses (unused-import).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (2)
src/app/endpoints/streaming_query.py (2)

52-56: LGTM: centralized OLS event constants

Consistent event naming reduces drift. No changes needed.


761-766: LGTM: explicit media_type on StreamingResponse

Good to surface the negotiated media type here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/models/requests/test_query_request.py (1)

137-156: Add test coverage for invalid media_type validation.

The test correctly verifies that no warning is logged for supported media types. However, it's missing coverage for the new strict validation that raises ValueError for unsupported media types.

Add a test case to verify the new validation behavior:

def test_validate_media_type_invalid(self) -> None:
    """Test the validate_media_type method with invalid media type."""
    with pytest.raises(
        ValueError, 
        match="media_type must be either 'application/json' or 'text/plain'"
    ):
        QueryRequest(
            query="Tell me about Kubernetes",
            provider="OpenAI",
            model="gpt-3.5-turbo",
            media_type="application/xml",
        )
🧹 Nitpick comments (3)
tests/unit/models/requests/test_query_request.py (1)

148-148: Consider using the new MEDIA_TYPE_TEXT constant.

The test hardcodes "text/plain" instead of using the new MEDIA_TYPE_TEXT constant from src/constants.py. Using the constant would improve maintainability and ensure consistency with the validation logic.

Apply this diff:

+from constants import MEDIA_TYPE_TEXT
+
 class TestQueryRequest:
         qr = QueryRequest(
             query="Tell me about Kubernetes",
             provider="OpenAI",
             model="gpt-3.5-turbo",
-            media_type="text/plain",
+            media_type=MEDIA_TYPE_TEXT,
         )
tests/unit/app/endpoints/test_streaming_query.py (1)

1657-1658: Address pylint warnings: move imports to module level.

While the import-outside-toplevel pattern works, it triggers multiple pylint C0415 warnings throughout the new test classes (lines 1657-1920). Consider moving these imports to the module level for consistency with the project's coding guidelines.

Apply this pattern to all affected test methods:

+from app.endpoints.streaming_query import (
+    stream_event,
+    stream_end_event,
+    prompt_too_long_error,
+    generic_llm_error,
+    LLM_TOKEN_EVENT,
+    LLM_TOOL_CALL_EVENT,
+    LLM_TOOL_RESULT_EVENT,
+)
+from constants import MEDIA_TYPE_JSON, MEDIA_TYPE_TEXT

 class TestOLSStreamEventFormatting:
     """Test the stream_event function for both media types (OLS compatibility)."""

     def test_stream_event_json_token(self):
         """Test token event formatting for JSON media type."""
-        from app.endpoints.streaming_query import stream_event, LLM_TOKEN_EVENT
-        from constants import MEDIA_TYPE_JSON
src/app/endpoints/streaming_query.py (1)

665-665: Consider refactoring to reduce function complexity.

The streaming_query_endpoint_handler function has 54 statements, exceeding the recommended limit of 50 (R0915). While the current implementation is functional, consider extracting helper functions for:

  • User conversation validation (lines 703-720)
  • Response generation and metrics (lines 722-831)

This is optional and can be deferred to a future refactor.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cfa516a and c2a1ccc.

📒 Files selected for processing (5)
  • src/app/endpoints/streaming_query.py (22 hunks)
  • src/constants.py (1 hunks)
  • src/models/requests.py (2 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (10 hunks)
  • tests/unit/models/requests/test_query_request.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (10)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/constants.py
  • src/models/requests.py
  • src/app/endpoints/streaming_query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/constants.py
  • tests/unit/models/requests/test_query_request.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • src/models/requests.py
  • src/app/endpoints/streaming_query.py
src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

  • src/constants.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/models/requests/test_query_request.py
  • tests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/models/requests/test_query_request.py
  • tests/unit/app/endpoints/test_streaming_query.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/requests.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/requests.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/app/endpoints/streaming_query.py (5)
  • streaming_query_endpoint_handler (665-852)
  • stream_event (203-228)
  • stream_end_event (153-200)
  • prompt_too_long_error (311-333)
  • generic_llm_error (336-359)
src/models/requests.py (1)
  • QueryRequest (72-223)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (1)
  • ToolCall (45-50)
🪛 GitHub Actions: Python linter
tests/unit/app/endpoints/test_streaming_query.py

[warning] 1788-1788: Pylint C0301: Line too long (106/100) (line-too-long)


[warning] 1657-1657: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOKEN_EVENT) (import-outside-toplevel)


[warning] 1658-1658: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1668-1668: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOKEN_EVENT) (import-outside-toplevel)


[warning] 1669-1669: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_TEXT) (import-outside-toplevel)


[warning] 1678-1678: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOOL_CALL_EVENT) (import-outside-toplevel)


[warning] 1679-1679: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1695-1695: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOOL_CALL_EVENT) (import-outside-toplevel)


[warning] 1696-1696: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_TEXT) (import-outside-toplevel)


[warning] 1712-1712: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOOL_RESULT_EVENT) (import-outside-toplevel)


[warning] 1713-1713: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1726-1726: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOOL_RESULT_EVENT) (import-outside-toplevel)


[warning] 1727-1727: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_TEXT) (import-outside-toplevel)


[warning] 1740-1740: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event) (import-outside-toplevel)


[warning] 1741-1741: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_TEXT) (import-outside-toplevel)


[warning] 1754-1754: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_end_event) (import-outside-toplevel)


[warning] 1755-1755: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1779-1779: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_end_event) (import-outside-toplevel)


[warning] 1780-1780: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1793-1793: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.prompt_too_long_error) (import-outside-toplevel)


[warning] 1794-1794: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1807-1807: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.prompt_too_long_error) (import-outside-toplevel)


[warning] 1808-1808: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1819-1819: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.generic_llm_error) (import-outside-toplevel)


[warning] 1820-1820: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1832-1832: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.generic_llm_error) (import-outside-toplevel)


[warning] 1833-1833: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1846-1846: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.generic_llm_error) (import-outside-toplevel)


[warning] 1847-1847: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1860-1860: Pylint W0621: Redefining name 'QueryRequest' from outer scope (line 47) (redefined-outer-name)


[warning] 1860-1860: Pylint W0404: Reimport 'QueryRequest' (imported line 47) (reimported)


[warning] 1860-1860: Pylint C0415: Import outside toplevel (models.requests.QueryRequest) (import-outside-toplevel)


[warning] 1875-1875: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_event, app.endpoints.streaming_query.LLM_TOKEN_EVENT, app.endpoints.streaming_query.LLM_TOOL_CALL_EVENT, app.endpoints.streaming_query.LLM_TOOL_RESULT_EVENT) (import-outside-toplevel)


[warning] 1881-1881: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)


[warning] 1919-1919: Pylint C0415: Import outside toplevel (app.endpoints.streaming_query.stream_end_event) (import-outside-toplevel)


[warning] 1920-1920: Pylint C0415: Import outside toplevel (constants.MEDIA_TYPE_JSON) (import-outside-toplevel)

src/app/endpoints/streaming_query.py

[warning] 844-844: Pylint W0718: Catching too general exception Exception (broad-exception-caught)


[warning] 665-665: Pylint R0915: Too many statements (54/50) (too-many-statements)

🔇 Additional comments (17)
src/constants.py (1)

58-60: LGTM!

The media type constants are well-defined with clear naming and appropriate placement. The comment accurately describes their purpose for streaming responses.

src/models/requests.py (1)

147-151: LGTM!

The field description and examples are clear and accurate, properly documenting the supported media types.

tests/unit/app/endpoints/test_streaming_query.py (5)

186-192: LGTM: Error handling now returns StreamingResponse.

The test correctly validates the updated error handling behavior where connection errors return a StreamingResponse with application/json media type instead of raising an HTTPException. This aligns with the OLS-compatible error handling introduced in the main endpoint.


846-848: LGTM: Test updated for OLS event type changes.

The assertion correctly expects "event": "token" instead of the previous turn_complete event type, aligning with the OLS-compatible event structure.


880-881: LGTM: Role field removed for OLS compatibility.

The assertion correctly validates that the role field is no longer present in the event payload, as documented in the comment on line 880.


1103-1104: LGTM: Heartbeat now uses token event.

The test correctly expects a "token" event type with "heartbeat" value, consistent with the OLS-compatible event structure.


1647-1936: Excellent OLS compatibility test coverage.

The new test classes provide comprehensive coverage of:

  • Media type-aware event formatting (JSON vs text/plain)
  • End event structure and document references
  • Error handling for both media types
  • Integration tests validating the complete OLS event structure

The test assertions are thorough and correctly validate the absence of the role field and the presence of expected OLS event types.

src/app/endpoints/streaming_query.py (10)

109-112: LGTM: OLS event type constants are well-defined.

The new event type constants centralize event naming and improve maintainability. The comment clearly indicates their purpose for OLS compatibility.


153-200: LGTM: Media type handling correctly implemented.

The function properly formats end events for both JSON and text/plain media types. The text format provides a clean, human-readable list of referenced documents.

Note: A past review comment suggests moving available_quotas from the top level into the data object for consistency with other event payloads (line 198).


203-228: LGTM: Centralized event formatting with proper error handling.

The stream_event function provides a clean abstraction for media type-aware event formatting. The error logging for unknown event types (line 221) helps with debugging and monitoring.


231-277: LGTM: media_type properly propagated through event handlers.

The stream_build_event signature correctly adds the media_type parameter, and all handler invocations consistently pass it through. This ensures media type-aware formatting throughout the event pipeline.


311-333: LGTM: Appropriate error handling with correct status code.

The prompt_too_long_error function correctly uses HTTP status code 413 (Request Entity Too Large) for oversized prompts and provides clear error messages for both media types.


336-359: LGTM: Generic error handler with proper logging.

The generic_llm_error function appropriately logs exceptions (lines 346-347) and provides consistent error responses for both media types.


365-660: LGTM: Consistent media type handling across all event handlers.

All event handlers have been systematically updated to:

  • Accept the media_type parameter
  • Use the stream_event abstraction instead of direct format_stream_data calls
  • Utilize the new event type constants (LLM_TOKEN_EVENT, LLM_TOOL_CALL_EVENT, LLM_TOOL_RESULT_EVENT)

The refactoring is thorough and maintains consistency across the entire event emission pipeline.


759-789: LGTM: Media type correctly propagated through response generator.

The response generator properly:

  • Determines media type from the request with appropriate fallback (line 760)
  • Conditionally sends the start event only for JSON responses (lines 763-764)
  • Passes media type to all event emission functions (lines 784, 789)

844-852: Broad exception catch is appropriate for error boundary.

The broad Exception catch on line 844 serves as a top-level error boundary to ensure any unexpected errors are gracefully converted to streaming error responses rather than causing unhandled exceptions. The pylint W0718 warning can be safely ignored in this context.

The error response correctly:

  • Determines the appropriate media type (line 846)
  • Formats the error using generic_llm_error (line 847)
  • Returns a StreamingResponse with proper media type (line 852)

829-831: LGTM: StreamingResponse properly includes media_type.

The response correctly determines and propagates the media type to the StreamingResponse constructor, ensuring the HTTP Content-Type header matches the actual response format.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/unit/app/endpoints/test_streaming_query.py (1)

1668-1668: Fix line length.

Line exceeds the 100-character limit (106 characters).

Apply this diff to fix:

-        expected = 'data: {"event": "token", "data": {"id": 0, "role": "inference", "token": "Hello"}}\n\n'  # pylint: disable=line-too-long
+        expected = (
+            'data: {"event": "token", "data": '
+            '{"id": 0, "role": "inference", "token": "Hello"}}\n\n'
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c2a1ccc and fca0b38.

📒 Files selected for processing (5)
  • src/app/endpoints/streaming_query.py (22 hunks)
  • src/constants.py (1 hunks)
  • src/models/requests.py (2 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (11 hunks)
  • tests/unit/models/requests/test_query_request.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/constants.py
  • tests/unit/models/requests/test_query_request.py
  • src/models/requests.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
  • src/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/app/endpoints/streaming_query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/app/endpoints/streaming_query.py (5)
  • stream_event (203-228)
  • stream_end_event (153-200)
  • prompt_too_long_error (311-333)
  • generic_llm_error (336-359)
  • streaming_query_endpoint_handler (675-864)
src/models/requests.py (1)
  • QueryRequest (72-223)
src/app/endpoints/streaming_query.py (2)
src/models/responses.py (1)
  • ToolCall (45-50)
src/models/config.py (1)
  • Action (327-366)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests
  • GitHub Check: build-pr
🔇 Additional comments (9)
tests/unit/app/endpoints/test_streaming_query.py (2)

43-57: LGTM! Imports properly support OLS compatibility testing.

The new imports for streaming event helpers, error formatters, event type constants, and media type constants are correctly sourced and align with the OLS compatibility requirements described in the PR objectives.


1660-1930: Excellent OLS compatibility test coverage!

The test suite comprehensively validates OLS-compatible streaming behavior across multiple dimensions:

  • Event formatting for both JSON and text media types (token, tool_call, tool_result)
  • End event structure with referenced documents and quotas
  • Error handling for both prompt-too-long and generic errors
  • Media type validation
  • Complete event structure verification

This thorough coverage will help prevent regressions in OLS compatibility.

src/app/endpoints/streaming_query.py (7)

7-7: LGTM! Imports and constants properly support OLS compatibility.

The additions appropriately support the new OLS-compatible streaming functionality:

  • AsyncGenerator enables proper typing for async error generators
  • Media type constants provide clear semantic values
  • Event type constants follow clear naming conventions

Also applies to: 40-40, 109-112


65-84: LGTM! OpenAPI documentation properly reflects dual media type support.

The response documentation now correctly documents both application/json and text/plain media types with appropriate examples, improving API discoverability for consumers.


153-200: Media type handling looks good, but note existing concern about available_quotas placement.

The function correctly implements media-type-aware formatting for end events. However, a previous review comment noted that available_quotas at line 198 sits at the top level alongside event and data, which may be inconsistent with client expectations. Consider whether this should be nested inside the data object.

Based on past review comments


203-228: LGTM! Clean centralized event formatting.

The stream_event function provides a clean abstraction for media-type-aware event formatting. The implementation properly handles text/plain with appropriate formatting for each event type and includes error logging for unknown types.


231-670: Excellent media type propagation throughout event handlers!

The changes systematically propagate media_type through the entire event pipeline:

  • All handler functions consistently accept media_type with sensible defaults
  • All event emissions route through stream_event for consistent formatting
  • Backward compatibility maintained via default parameters
  • Clean separation of concerns between event construction and formatting

The implementation demonstrates good architectural discipline.


311-359: LGTM! Media-type-aware error handling properly implemented.

Both error handling functions (prompt_too_long_error and generic_llm_error) provide appropriate formatting for both media types:

  • Text format: Simple, readable error messages
  • JSON format: Structured error responses with appropriate fields (status_code, response, cause)

The functions follow the same clean pattern as stream_event and include proper logging.


769-774: LGTM! Proper media type integration in endpoint handler.

The changes correctly integrate media type throughout the streaming lifecycle:

  1. Media type properly extracted from request with JSON default
  2. Start event conditionally sent only for JSON (text/plain doesn't need it)
  3. Media type passed through to all event generators
  4. Streaming response properly typed with media_type

The broad exception handler at lines 854-864 is intentional (as documented in comments) to ensure all errors are converted to OLS-compatible streaming responses, maintaining the streaming connection. The APIConnectionError at line 843 correctly raises HTTPException since connection failures prevent streaming.

Also applies to: 793-799, 839-864

Comment on lines 194 to 207
# await the async function - should return a streaming response with error
response = await streaming_query_endpoint_handler(
request, query_request, auth=MOCK_AUTH
)

assert isinstance(response, StreamingResponse)
assert response.media_type == "application/json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify connection error handling behavior change.

The test now expects StreamingResponse with media_type "application/json" instead of raising HTTPException. This aligns with the new generic_llm_error flow but represents a significant behavior change in error handling.

Based on the implementation in src/app/endpoints/streaming_query.py (lines 854-864), APIConnectionError still raises HTTPException while other exceptions return StreamingResponse. Please confirm this test accurately reflects the expected behavior for connection errors.


🏁 Script executed:

#!/bin/bash
# Verify how APIConnectionError is handled in the endpoint
rg -A 10 'except APIConnectionError' src/app/endpoints/streaming_query.py

Length of output: 438


Adjust test to expect HTTPException for connection errors. The handler’s except APIConnectionError block raises an HTTPException (500), so update tests/unit/app/endpoints/test_streaming_query.py (lines 194–200) to assert that an HTTPException is raised rather than returning a StreamingResponse.

🤖 Prompt for AI Agents
In tests/unit/app/endpoints/test_streaming_query.py around lines 194 to 200, the
test expects a StreamingResponse but the handler catches APIConnectionError and
raises an HTTPException(500); update the test to assert that calling
streaming_query_endpoint_handler raises HTTPException instead of returning a
StreamingResponse — use the test framework's context manager (e.g.,
pytest.raises(HTTPException)) or equivalent to call the handler and assert the
exception status_code is 500 and any relevant message.

@gallettilance gallettilance force-pushed the streaming-query branch 4 times, most recently from 09a77d2 to c8774bb Compare September 30, 2025 16:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/app/endpoints/test_streaming_query.py (1)

898-942: Inconsistency: comments claim "role field removed" but implementation includes it.

Comments at lines 898, 941, 969, 998, 1029, and 1081 state "Role field removed for OLS compatibility", but the implementation in streaming_query.py explicitly includes "role": "inference" in event data (e.g., lines 456-461, 496-502, 531-534).

The new OLS compatibility tests (lines 1902, 1920, 1937) actually assert that the role field is present. Either remove these misleading comments or update the implementation to match the stated intent.

♻️ Duplicate comments (2)
src/models/requests.py (1)

213-223: Address the previous review comment about using constants.

The validation logic correctly raises a ValueError for unsupported media types. However, as noted in the previous review comment, the hardcoded strings "application/json" and "text/plain" should be replaced with the MEDIA_TYPE_JSON and MEDIA_TYPE_TEXT constants from src/constants.py to prevent maintenance issues where constants and validation logic could drift apart.

Based on the previous review comment, apply this refactor:

+from constants import MEDIA_TYPE_JSON, MEDIA_TYPE_TEXT
+
 """Models for REST API requests."""
     @model_validator(mode="after")
     def validate_media_type(self) -> Self:
         """Validate media_type field."""
         if self.media_type and self.media_type not in [
-            "application/json",
-            "text/plain",
+            MEDIA_TYPE_JSON,
+            MEDIA_TYPE_TEXT,
         ]:
             raise ValueError(
-                "media_type must be either 'application/json' or 'text/plain'"
+                f"media_type must be either '{MEDIA_TYPE_JSON}' or '{MEDIA_TYPE_TEXT}'"
             )
         return self
tests/unit/app/endpoints/test_streaming_query.py (1)

201-207: Test expects StreamingResponse but handler raises HTTPException.

The test expects a StreamingResponse to be returned, but the handler's except APIConnectionError block (lines 871-881 in streaming_query.py) raises an HTTPException with status 500. Update the test to assert that an HTTPException is raised.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fca0b38 and c8774bb.

📒 Files selected for processing (5)
  • src/app/endpoints/streaming_query.py (21 hunks)
  • src/constants.py (1 hunks)
  • src/models/requests.py (2 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (12 hunks)
  • tests/unit/models/requests/test_query_request.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/constants.py
🧰 Additional context used
📓 Path-based instructions (9)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
  • src/models/requests.py
  • tests/unit/models/requests/test_query_request.py
  • src/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/models/requests/test_query_request.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/models/requests/test_query_request.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/models/requests.py
  • src/app/endpoints/streaming_query.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/requests.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/requests.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/app/endpoints/streaming_query.py (5)
  • stream_event (206-231)
  • stream_end_event (156-203)
  • prompt_too_long_error (314-336)
  • generic_llm_error (339-362)
  • streaming_query_endpoint_handler (689-892)
src/models/requests.py (1)
  • QueryRequest (72-223)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (1)
  • ToolCall (47-52)
🪛 GitHub Actions: Black
src/app/endpoints/streaming_query.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted. Run 'black --write' to fix code style issues.

🪛 GitHub Actions: Python linter
src/app/endpoints/streaming_query.py

[error] 235-235: pylint: C0301 Line too long (120/100) in streaming_query.py

🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_streaming_query.py

[error] 1-1: Test failure: streaming query endpoint did not emit expected SSE event 'start'. The assertion '"event": "start"' was not found in the response stream.


[error] 1-1: Test failure: streaming query endpoint with transcripts enabled/disabled failed for SSE content validation (same root cause as above).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (13)
src/models/requests.py (1)

149-150: LGTM! Field description accurately documents supported media types.

The updated field description clearly indicates support for both "application/json" and "text/plain", which aligns with the PR objectives.

tests/unit/models/requests/test_query_request.py (1)

137-156: Invalid media_type ValueError coverage already exists The invalid‐media_type scenario is covered in tests/unit/app/endpoints/test_streaming_query.py, so no additional test in test_query_request.py is needed.

Likely an incorrect or invalid review comment.

tests/unit/app/endpoints/test_streaming_query.py (3)

1678-1956: LGTM! Comprehensive OLS compatibility test coverage.

The new test classes provide excellent coverage for OLS-compatible event formatting across both JSON and text media types. Tests verify:

  • Token, tool_call, and tool_result event structures
  • End event formatting with referenced documents
  • Error handling for prompt_too_long and generic errors
  • Media type validation and event structure compliance

43-57: LGTM! Imports align with OLS compatibility changes.

The new imports for streaming helpers, event constants, and media types are properly used throughout the new test classes and align with the implementation in streaming_query.py.


799-800: Include conversation_id in your stream_build_event test setup
The turn_start test calls stream_build_event(chunk, 0, {}), leaving conversation_id=None, so the emitted event omits "conversation_id". Update the test to pass the expected conversation_id (for example, stream_build_event(chunk, 0, {}, MEDIA_TYPE_JSON, '<your-id>')) or include it in the metadata_map.

src/app/endpoints/streaming_query.py (8)

112-115: LGTM! OLS-compatible event type constants.

The event type constants are well-named and follow the naming convention. They're properly used throughout the event handlers.


156-203: LGTM! Media-type-aware stream end event.

The stream_end_event function properly handles both JSON and text media types, formatting referenced documents appropriately for each. The text format provides a clean separator and document list, while JSON maintains the structured SSE format.


206-231: LGTM! Centralized event formatting with media type support.

The stream_event function provides clean media-type-aware formatting for all event types. The text/plain branch returns simple tokens or JSON-wrapped tool data, while the JSON branch uses consistent SSE formatting. Unknown event types are logged and return empty string, which is safe.


314-362: LGTM! Comprehensive media-type-aware error handling.

Both prompt_too_long_error and generic_llm_error provide appropriate error formatting for JSON and text media types. The JSON format includes structured error details (status_code, response, cause) while text format provides simple error messages. Error logging is appropriate.


368-398: Verify: text media type emits JSON SSE for start event.

When media_type == MEDIA_TYPE_TEXT, the start event at line 389-391 still emits a JSON SSE format (data: {...}\n\n). Is this intentional? The PR description states text/plain support was added, but control events (start, turn_complete) remain in JSON SSE format even for text/plain requests.

If this is intentional to keep metadata events structured, consider documenting this behavior. Otherwise, for pure text/plain responses, the start event might not need to be emitted at all (or emit as plain text).


454-683: Verify role field requirement for OLS compatibility.

All event handlers include "role": "inference" in their data dictionaries (e.g., lines 456-461, 496-502, 531-534, 569-574, etc.), but test comments claim "role field removed for OLS compatibility". The new OLS compatibility tests actually assert that the role field is present (test lines 1902, 1920, 1937).

Clarify the intended behavior: should the role field be included or excluded for OLS compatibility? If it should be included (as the implementation and new tests suggest), remove the misleading comments in the test file.


882-892: Inconsistent error handling between APIConnectionError and other exceptions.

Lines 882-892 catch generic exceptions and return a StreamingResponse with error content, but APIConnectionError (lines 871-881) raises an HTTPException. This is inconsistent.

If the goal is to return OLS-compatible streaming error responses for all errors (as the comment states), consider also returning a StreamingResponse for APIConnectionError instead of raising HTTPException. Otherwise, clarify why connection errors should terminate the stream while other errors should be streamed.


805-811: LGTM! Media type propagation through event generation.

The stream_build_event call properly passes media_type and conversation_id, and stream_end_event receives media_type. The media type flow from request through to event generation is correct.

@gallettilance gallettilance force-pushed the streaming-query branch 3 times, most recently from 07bab23 to 1510599 Compare October 1, 2025 13:28
Copy link
Contributor

@are-ces are-ces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just couple of nits


# media_type is not used in 'lightspeed-stack' that only supports application/json.
# the field is kept here to enable compatibility with 'road-core' clients.
media_type: Optional[str] = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add media_type to the QueryRequest docstring, it was missing :)

"media_type was set in the request but is not supported. The value will be ignored."
"""Validate media_type field."""
if self.media_type and self.media_type not in [
"application/json",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by coderabbit, use constants for media types

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall, just pls look at @are-ces 's comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/unit/app/endpoints/test_streaming_query.py (1)

201-207: Test expectation doesn't match implementation behavior.

The test expects StreamingResponse, but the handler's APIConnectionError block (lines 881-891 in src/app/endpoints/streaming_query.py) raises HTTPException(500). Update the test to assert that an HTTPException is raised with status code 500.

Apply this diff:

-    # await the async function - should return a streaming response with error
-    response = await streaming_query_endpoint_handler(
-        request, query_request, auth=MOCK_AUTH
-    )
-
-    assert isinstance(response, StreamingResponse)
-    assert response.media_type == "application/json"
+    # await the async function - should raise HTTPException
+    with pytest.raises(HTTPException) as exc_info:
+        await streaming_query_endpoint_handler(
+            request, query_request, auth=MOCK_AUTH
+        )
+    
+    assert exc_info.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert "Unable to connect to Llama Stack" in exc_info.value.detail["response"]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1510599 and 908e0f3.

📒 Files selected for processing (5)
  • src/app/endpoints/streaming_query.py (21 hunks)
  • src/constants.py (1 hunks)
  • src/models/requests.py (4 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (12 hunks)
  • tests/unit/models/requests/test_query_request.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/models/requests/test_query_request.py
  • src/models/requests.py
🧰 Additional context used
📓 Path-based instructions (8)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/constants.py
  • src/app/endpoints/streaming_query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/constants.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • src/app/endpoints/streaming_query.py
src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

  • src/constants.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/app/endpoints/streaming_query.py (5)
  • stream_event (210-235)
  • stream_end_event (157-207)
  • prompt_too_long_error (322-344)
  • generic_llm_error (347-370)
  • streaming_query_endpoint_handler (689-902)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (1)
  • ToolCall (96-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (13)
src/constants.py (1)

119-122: LGTM!

The media type constants are well-defined, follow naming conventions, and use standard MIME types. The descriptive comment provides clear context for their usage in streaming responses.

tests/unit/app/endpoints/test_streaming_query.py (3)

43-49: LGTM!

The new imports for streaming helpers and event constants are necessary for the OLS compatibility tests. The imports are correctly organized and follow Python conventions.


57-57: LGTM!

The import of media type constants is correct and necessary for the OLS compatibility tests.


1732-1993: LGTM!

The new OLS compatibility test classes are comprehensive and well-structured. They effectively validate:

  • Media type-aware formatting for both JSON and text outputs
  • Correct event types (token, tool_call, tool_result, end)
  • Proper data structure without the "role" field (OLS compatibility)
  • Error handling for both prompt_too_long and generic errors
  • End event structure with all required fields

The tests follow pytest conventions and provide good coverage for the OLS compatibility features.

src/app/endpoints/streaming_query.py (9)

7-7: LGTM!

The import additions are appropriate:

  • uuid is used for generating conversation IDs
  • cast from typing is used for type safety
  • Media type constants are imported from the centralized constants module

All imports are used in the code and follow Python conventions.

Also applies to: 9-9, 43-43


113-116: LGTM!

The OLS-compatible event type constants are well-defined and clearly documented. They provide a single source of truth for event types used throughout the streaming logic.


69-87: LGTM!

The OpenAPI response schema has been appropriately updated to document both media types (application/json and text/plain). The examples clearly illustrate the expected format for each media type, improving API documentation.


157-207: LGTM!

The stream_end_event function has been effectively updated to support both media types:

  • Text/plain: returns formatted reference documents with a separator
  • JSON: returns structured SSE-formatted data with referenced documents

The default parameter ensures backward compatibility, and the implementation correctly filters and formats metadata.


210-235: LGTM!

The stream_event function provides clean centralized formatting for streaming events:

  • Text/plain: extracts token text directly or formats tool events as JSON strings
  • JSON: wraps events in SSE format consistently
  • Includes error logging for unknown event types in text mode

The implementation is clear and maintainable.


238-244: LGTM!

The updated function signature for stream_build_event properly adds media_type and conversation_id parameters with appropriate defaults. The multi-line formatting follows Python conventions and stays within line length limits.


264-288: LGTM!

The dispatch logic in stream_build_event correctly passes the media_type parameter to all handlers, and appropriately forwards conversation_id to the turn_start handler. The pattern matching is clean and maintainable.


294-370: LGTM!

The error handling functions are well-structured and provide OLS-compatible responses:

  • _handle_error_event: handles chunk errors with media type awareness
  • prompt_too_long_error: returns 413 status code with descriptive error details
  • generic_llm_error: provides structured error responses for generic LLM failures

All functions follow consistent patterns, include appropriate logging, and format errors correctly for both media types.


376-684: LGTM!

All event handlers have been consistently updated to support media type awareness:

  • Handlers now accept media_type parameter with appropriate defaults
  • Most handlers use stream_event for consistent formatting
  • Each handler correctly branches on media type to produce text or JSON output
  • Tool execution handler properly handles different tool types (memory queries, knowledge search, etc.)

The implementation maintains consistency across all event types while providing the flexibility needed for OLS compatibility.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/app/endpoints/streaming_query.py (1)

195-207: Consider moving available_quotas into the data object for consistency.

Currently, available_quotas sits at the top level alongside event and data (line 205). This is inconsistent with other events where all payload information is nested inside data. For a uniform event structure, consider moving it inside the data object:

     return format_stream_data(
         {
             "event": "end",
             "data": {
                 "rag_chunks": [],  # TODO(jboos): implement RAG chunks when summary is available
                 "referenced_documents": referenced_docs_dict,
                 "truncated": None,  # TODO(jboos): implement truncated
                 "input_tokens": 0,  # TODO(jboos): implement input tokens
                 "output_tokens": 0,  # TODO(jboos): implement output tokens
+                "available_quotas": {},  # TODO(jboos): implement available quotas
             },
-            "available_quotas": {},  # TODO(jboos): implement available quotas
         }
     )

This maintains consistency with the event structure used throughout the streaming API.

tests/unit/app/endpoints/test_streaming_query.py (1)

201-207: Fix test to expect HTTPException for connection errors.

The test expects a StreamingResponse, but the handler's except APIConnectionError block (lines 880-891 in streaming_query.py) raises an HTTPException with status 500. This mismatch is causing the pipeline failure.

Update the test to assert that an HTTPException is raised:

-    # await the async function - should return a streaming response with error
-    response = await streaming_query_endpoint_handler(
-        request, query_request, auth=MOCK_AUTH
-    )
-
-    assert isinstance(response, StreamingResponse)
-    assert response.media_type == "application/json"
+    # await the async function - should raise HTTPException for connection errors
+    with pytest.raises(HTTPException) as exc_info:
+        await streaming_query_endpoint_handler(
+            request, query_request, auth=MOCK_AUTH
+        )
+    
+    assert exc_info.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert "Unable to connect to Llama Stack" in exc_info.value.detail["response"]

As per coding guidelines

🧹 Nitpick comments (1)
src/app/endpoints/streaming_query.py (1)

376-408: Optional: Reformat long line instead of using pylint disable.

Line 400 uses an explicit pylint: disable=line-too-long comment. Consider breaking the line for better readability:

-        yield (
-            f"data: {json.dumps({'event': 'start', 'data': {'conversation_id': conversation_id}})}\n\n"  # pylint: disable=line-too-long
-        )
+        yield (
+            f"data: {json.dumps("
+            f"{{'event': 'start', 'data': {{'conversation_id': '{conversation_id}'}}}}"
+            f")}\n\n"
+        )

This maintains readability while staying within the line length limit.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 908e0f3 and ea5eadd.

📒 Files selected for processing (5)
  • src/app/endpoints/streaming_query.py (21 hunks)
  • src/constants.py (1 hunks)
  • src/models/requests.py (4 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (12 hunks)
  • tests/unit/models/requests/test_query_request.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/models/requests/test_query_request.py
  • src/constants.py
🧰 Additional context used
📓 Path-based instructions (9)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/models/requests.py
  • src/app/endpoints/streaming_query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/models/requests.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • src/app/endpoints/streaming_query.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/requests.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/requests.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
src/app/endpoints/streaming_query.py (5)
  • stream_event (210-235)
  • stream_end_event (157-207)
  • prompt_too_long_error (322-344)
  • generic_llm_error (347-370)
  • streaming_query_endpoint_handler (689-906)
src/models/requests.py (2)
  • QueryRequest (73-225)
  • Attachment (16-70)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (1)
  • ToolCall (96-101)
🪛 GitHub Actions: Unit tests
tests/unit/app/endpoints/test_streaming_query.py

[error] 1-1: Test 'test_streaming_query_endpoint_on_connection_error' failed: response.media_type == 'application/json' but was 'text/event-stream'.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: build-pr
🔇 Additional comments (7)
src/models/requests.py (1)

11-11: LGTM! Media type validation properly implemented.

The changes correctly:

  • Import media type constants from the constants module
  • Update the field description and examples to use the constants
  • Enforce validation by raising ValueError for invalid media types instead of just logging a warning

This aligns with the PR objective of ensuring OLS compatibility by strictly validating supported media types.

Also applies to: 84-84, 151-152, 217-224

tests/unit/app/endpoints/test_streaming_query.py (2)

43-50: LGTM! Test imports align with new public API.

The new imports correctly reference the expanded public API from streaming_query.py (stream_event, stream_end_event, error handlers, event constants) and media type constants. These are necessary for testing OLS compatibility features.

Also applies to: 57-57


1725-1993: LGTM! Comprehensive OLS compatibility tests.

The new test classes thoroughly validate:

  • Event formatting for both JSON and text media types
  • Token, tool call, and tool result event structures
  • End event structure with referenced documents
  • Error handling for prompt too long and generic LLM errors
  • Media type validation in QueryRequest
  • Proper exclusion of "role" field for OLS compatibility

The tests are well-organized and follow pytest best practices.

src/app/endpoints/streaming_query.py (4)

113-116: LGTM! OLS-compatible event constants.

The new event type constants (LLM_TOKEN_EVENT, LLM_TOOL_CALL_EVENT, LLM_TOOL_RESULT_EVENT) provide clear abstractions for the different event types and align with OLS compatibility requirements.


210-235: LGTM! Central event formatting function.

The stream_event function properly centralizes media-type-aware event formatting. It correctly:

  • Returns plain text tokens for MEDIA_TYPE_TEXT
  • Returns SSE-formatted JSON for MEDIA_TYPE_JSON
  • Handles tool call and tool result events appropriately
  • Logs unknown event types and returns empty string as a safe fallback

The function signature includes complete type hints and a clear docstring following Google style.


322-370: LGTM! Well-structured error handling functions.

The error handling functions (prompt_too_long_error and generic_llm_error) properly:

  • Support both text and JSON media types
  • Include appropriate error details (status_code, response, cause)
  • Use proper logging (error level with exception details)
  • Follow Google-style docstrings with complete type hints
  • Return OLS-compatible error structures

These align with the PR objective of providing richer error objects compatible with OLS.


784-785: LGTM! Media type properly propagated throughout streaming flow.

The media type is correctly:

  • Determined from query_request.media_type or defaulted to MEDIA_TYPE_JSON (line 785)
  • Passed to stream_build_event along with conversation_id (lines 809-811)
  • Passed to stream_end_event (line 815)
  • Used to set appropriate HTTP Content-Type header as text/event-stream for SSE (line 878)

This ensures consistent media-type-aware formatting throughout the streaming response lifecycle.

Also applies to: 809-811, 815-815, 874-879

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tests/unit/app/endpoints/test_streaming_query.py (1)

201-207: Critical: Test expects wrong response type for connection errors.

The test expects a StreamingResponse, but the implementation at src/app/endpoints/streaming_query.py lines 880-891 raises an HTTPException with status code 500 for APIConnectionError. This test will fail when the connection error path is exercised.

Apply this diff to fix the test:

-    # await the async function - should return a streaming response with error
-    response = await streaming_query_endpoint_handler(
-        request, query_request, auth=MOCK_AUTH
-    )
-
-    assert isinstance(response, StreamingResponse)
-    assert response.media_type == "text/event-stream"
+    # await the async function - should raise HTTPException for connection errors
+    with pytest.raises(HTTPException) as exc_info:
+        await streaming_query_endpoint_handler(
+            request, query_request, auth=MOCK_AUTH
+        )
+    
+    assert exc_info.value.status_code == status.HTTP_500_INTERNAL_SERVER_ERROR
+    assert "Unable to connect to Llama Stack" in exc_info.value.detail["response"]
🧹 Nitpick comments (2)
src/models/requests.py (1)

84-84: Optional: Update docstring to use constant names.

The docstring at line 84 mentions "application/json or text/plain" as hardcoded strings. For consistency, consider referencing the constant names instead.

Apply this diff:

-        media_type: The optional media type for response format (application/json or text/plain).
+        media_type: The optional media type for response format (MEDIA_TYPE_JSON or MEDIA_TYPE_TEXT).
src/app/endpoints/streaming_query.py (1)

376-408: Simplify text media type start event formatting.

For MEDIA_TYPE_TEXT at line 400, the function still outputs SSE-formatted JSON. This is inconsistent with how stream_event handles text media type (which returns plain text for tokens). While this might be intentional for control events, it creates confusion and results in a very long line that exceeds limits.

Consider either:

  1. Using stream_event for consistency, or
  2. Simplifying the text case if control events must always be JSON

Apply this diff to improve readability at minimum:

     if media_type == MEDIA_TYPE_TEXT:
-        yield (
-            f"data: {json.dumps({'event': 'start', 'data': {'conversation_id': conversation_id}})}\n\n"  # pylint: disable=line-too-long
-        )
+        start_data = {
+            'event': 'start',
+            'data': {'conversation_id': conversation_id}
+        }
+        yield f"data: {json.dumps(start_data)}\n\n"
     else:
         yield format_stream_data(
             {
                 "event": "start",
                 "data": {"conversation_id": conversation_id},
             }
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea5eadd and f4fe32f.

📒 Files selected for processing (5)
  • src/app/endpoints/streaming_query.py (21 hunks)
  • src/constants.py (1 hunks)
  • src/models/requests.py (4 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (12 hunks)
  • tests/unit/models/requests/test_query_request.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/models/requests/test_query_request.py
🧰 Additional context used
📓 Path-based instructions (10)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)

Files:

  • src/models/requests.py
  • src/constants.py
  • src/app/endpoints/streaming_query.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed

Files:

  • src/models/requests.py
  • src/constants.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • src/app/endpoints/streaming_query.py
src/{models/**/*.py,configuration.py}

📄 CodeRabbit inference engine (CLAUDE.md)

src/{models/**/*.py,configuration.py}: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)

Files:

  • src/models/requests.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/requests.py
src/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep shared constants in a central src/constants.py with descriptive comments

Files:

  • src/constants.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests

Files:

  • tests/unit/app/endpoints/test_streaming_query.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code

Files:

  • src/app/endpoints/streaming_query.py
src/{app/**/*.py,client.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Use async def for I/O-bound operations and external API calls

Files:

  • src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling

Files:

  • src/app/endpoints/streaming_query.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_streaming_query.py (3)
src/app/endpoints/streaming_query.py (4)
  • stream_event (210-235)
  • stream_end_event (157-207)
  • prompt_too_long_error (322-344)
  • generic_llm_error (347-370)
src/models/requests.py (1)
  • QueryRequest (73-225)
src/utils/types.py (2)
  • ToolCallSummary (73-86)
  • TurnSummary (89-163)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (1)
  • ToolCall (96-101)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (8)
src/models/requests.py (1)

11-11: LGTM! Constants are properly imported and used in validation.

The validation logic correctly uses MEDIA_TYPE_JSON and MEDIA_TYPE_TEXT constants from src/constants.py, addressing the previous review feedback about avoiding hardcoded strings.

Also applies to: 218-224

tests/unit/app/endpoints/test_streaming_query.py (1)

1732-1976: LGTM! Comprehensive OLS compatibility test coverage.

The new OLS compatibility tests thoroughly exercise:

  • Event formatting for both JSON and text media types
  • Token, tool call, and tool result events
  • End event structure with referenced documents
  • Error handling for both media types
  • Integration scenarios verifying role field omission

The test organization and coverage are excellent.

src/app/endpoints/streaming_query.py (6)

113-117: LGTM! Event type constants are well-defined.

The OLS-compatible event type constants are clearly defined and appropriately scoped for use throughout the streaming logic.


210-235: LGTM! Stream event formatting correctly handles both media types.

The stream_event function properly routes based on media type:

  • Text media type returns plain tokens and formatted tool events
  • JSON media type returns SSE-formatted events with proper structure
  • Unknown event types are logged and handled gracefully

157-207: LGTM! End event formatting is media-type aware.

The function correctly handles both media types:

  • Text returns plain reference document links
  • JSON returns structured SSE with referenced documents and placeholder token counts

The placement of available_quotas at the root level (line 205) is consistent with test expectations in test_streaming_query.py line 1993.


322-370: LGTM! Error handling functions provide OLS-compatible responses.

Both prompt_too_long_error and generic_llm_error correctly:

  • Handle media type routing for text vs JSON responses
  • Include appropriate status codes and error details
  • Log errors at appropriate levels
  • Provide structured error information compatible with OLS

411-684: LGTM! Media type is properly propagated through all event handlers.

All event handlers have been updated to:

  • Accept media_type parameter with appropriate defaults
  • Use stream_event for consistent media-type-aware formatting
  • Maintain consistent patterns for token, tool call, and tool result events

The refactoring ensures OLS compatibility across all event types.


784-786: LGTM! Media type handling and exception flow are correct.

The implementation properly:

  • Derives media_type from the query request (lines 784-786)
  • Propagates it through event generation (lines 809-815)
  • Uses it for response formatting (lines 874-879)
  • Handles errors with OLS-compatible responses (lines 892-906)
  • Sets appropriate Content-Type headers based on media type (lines 902-905)

The exception handling correctly distinguishes between APIConnectionError (raises HTTPException) and other errors (returns streaming error response).

Also applies to: 809-815, 874-906

@are-ces
Copy link
Contributor

are-ces commented Oct 9, 2025

LGTM nice work

@tisnik tisnik merged commit 111bb6e into lightspeed-core:main Oct 9, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants