Skip to content

Conversation

radofuchs
Copy link
Contributor

@radofuchs radofuchs commented Sep 18, 2025

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Related Tickets & Documents

  • Related Issue #LCORE-432
  • Closes #LCORE-432

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Streaming endpoint now supports Server-Sent Events with example payloads.
    • Info now includes Llama Stack version.
    • Query requests support a new media_type field.
  • Bug Fixes

    • Standardized connectivity errors from 503 to 500 across info/models/query.
    • Added explicit 400/401/403 authentication/authorization errors to streaming and conversations endpoints.
  • Documentation

    • OpenAPI and docs updated for SSE responses, new fields, and revised error codes.
    • Feedback example updated to negative sentiment with categories.
  • Tests

    • OpenAPI spec tests updated to reflect new response codes and endpoints.

@radofuchs radofuchs requested a review from tisnik September 18, 2025 10:56
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

This PR updates OpenAPI docs and server endpoint response schemas, adds SSE documentation and handlers for the streaming query, aligns connectivity errors to 500, and introduces credential/authorization error responses across conversation endpoints. Tests are updated to reflect new response codes. InfoResponse gains llama_stack_version; docs are synced.

Changes

Cohort / File(s) Summary
OpenAPI and Docs
docs/openapi.json, docs/openapi.md, docs/output.md
Syncs API docs: adds llama_stack_version to InfoResponse; documents SSE for POST /v1/streaming_query; adds 400/401/403/500 responses; maps 503→500 for models/query/info; updates examples and wording.
Streaming SSE Implementation
src/app/endpoints/streaming_query.py
Introduces SSE utilities and streaming lifecycle; adds streaming_query_responses with SSE example; wires 400/401 (UnauthorizedResponse), 403 (ForbiddenResponse), 500; updates handler to emit start/token/end and collect metrics.
Conversations Responses
src/app/endpoints/conversations.py
Imports UnauthorizedResponse; adds 400/401 to list/get/delete conversations response maps; no control-flow changes.
Models/Query/Info Responses
src/app/endpoints/models.py, src/app/endpoints/query.py, src/app/endpoints/info.py
Maps connectivity errors to 500 (models/query); updates info docstring to mention Llama-stack version; no runtime logic changes.
OpenAPI Tests
tests/integration/test_openapi_json.py
Adjusts expected response codes for affected endpoints (adds 400/401, replaces 503 with 500, updates streaming/query expectations).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant API as /v1/streaming_query
  participant LS as Llama Stack
  participant SSE as SSE Stream

  Note over API,SSE: New SSE flow (text/event-stream)
  C->>API: POST /v1/streaming_query (request)
  API->>SSE: open stream
  API->>SSE: event: start
  API->>LS: initiate streaming inference
  loop chunks
    LS-->>API: token/tool/heartbeat chunk
    API->>SSE: event: token / tool / heartbeat
  end
  LS-->>API: done
  API->>SSE: event: end (usage, transcript, metadata)
  SSE-->>C: streamed events
  alt Auth or permission error
    API-->>C: 400/401/403 (JSON)
  else Connection/internal error
    API-->>C: 500 (detail)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • tisnik
  • umago
  • manstis
  • maorfr

Poem

Hoppity hop, I stream with glee,
Tokens drift by like leaves from a tree.
Start, token, end—an SSE tune,
Credentials checked, errors in tune.
Models now five-hundred when stacks won’t play,
Llama-stack version lights the way.
Thump-thump—ship it today! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "LCORE-432: update doc for endpoints" is related to the changeset because substantial documentation/OpenAPI and test updates are present, but it understates the scope: the PR also introduces public API signature changes (new fields on InfoResponse/QueryRequest) and a new SSE streaming endpoint in source code, so the title is accurate but not fully descriptive.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in overall, you just have two files there that should not be there

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/openapi.json (1)

177-183: Invalid OpenAPI: “detail” at the top level of a response is not allowed.

Move the payload under content with a schema (e.g., ErrorResponse). As written, this won’t validate.

-                    "500": {
-                        "description": "Internal Server Error",
-                        "detail": {
-                            "response": "Unable to connect to Llama Stack",
-                            "cause": "Connection error."
-                        }
-                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "content": {
+                            "application/json": {
+                                "schema": { "$ref": "#/components/schemas/ErrorResponse" }
+                            }
+                        }
+                    },
🧹 Nitpick comments (12)
src/auth/README.md (1)

1-2: Expand README to convey module purpose and usage.

A single header doesn’t help future contributors. Consider adding: overview, responsibilities, security model, env vars, key files, and how to add a new auth provider.

Apply this minimal scaffold:

-# List of source files stored in `src/auth` directory
+# Auth module
+
+Purpose: authentication/authorization utilities and integrations.
+
+Includes:
+- Providers: OAuth/OIDC, API key, service tokens
+- Middlewares/guards
+- Token/signing utilities
+
+Conventions:
+- All providers implement a common interface
+- No network calls in pure helpers
+
+Env vars:
+- AUTH_ISSUER, AUTH_AUDIENCE, AUTH_PUBLIC_KEYS_URL (example)
+
+Contributing:
+- Adding a provider: steps, tests, docs links
+
+Ownership: <team/Slack channel>
src/services/README.md (1)

1-2: Document service layer structure and conventions.

Add brief guidance so newcomers know how to navigate and extend services.

Suggested skeleton:

-# List of source files stored in `src/services` directory
+# Services layer
+
+Purpose: business logic orchestrating repositories, clients, and auth.
+
+Layout:
+- <service_name>Service.* — stateless, dependency-injected
+- Errors: use domain-specific error types
+
+Conventions:
+- Timeouts/retries handled in clients; services remain pure
+- No I/O in pure helpers; side effects isolated
+
+Testing:
+- Unit tests with mocked clients
+- Integration tests for critical flows
+
+Adding a service:
+- Interface, implementation, wiring, tests, docs
+
+Ownership: <team/Slack channel>
docs/output.md (1)

77-77: Table formatting needs blank lines.

Add blank lines before and after the table for proper Markdown formatting:

 | Status Code | Description | Component |
 |-------------|-------------|-----------|
 | 200 | Successful Response | [ModelsResponse](#modelsresponse) |
 | 500 | Connection to Llama Stack is broken |  |
+
 ## POST `/v1/query`
docs/openapi.md (1)

77-77: Table formatting needs blank lines.

Add blank lines around the table for proper Markdown formatting:

 | Status Code | Description | Component |
 |-------------|-------------|-----------|
 | 200 | Successful Response | [ModelsResponse](#modelsresponse) |
 | 500 | Connection to Llama Stack is broken |  |
+
 ## POST `/v1/query`
src/app/endpoints/streaming_query.py (1)

640-683: Verify error handling in the streaming generator.

The response_generator function handles the SSE stream well, but consider what happens if an exception occurs during streaming after the response has started. The client would receive a partial stream without a proper error event.

Consider adding error handling within the generator to emit an error event:

async def response_generator(
    turn_response: AsyncIterator[AgentTurnResponseStreamChunk],
) -> AsyncIterator[str]:
    chunk_id = 0
    summary = TurnSummary(
        llm_response="No response from the model", tool_calls=[]
    )
    
    # Send start event
    yield stream_start_event(conversation_id)
    
    try:
        async for chunk in turn_response:
            # ... existing chunk processing ...
            
        yield stream_end_event(metadata_map)
    except Exception as e:
        # Emit error event to the stream
        logger.exception("Error during streaming: %s", str(e))
        yield format_stream_data({
            "event": "error",
            "data": {
                "id": chunk_id,
                "message": "Stream interrupted due to an error"
            }
        })
        raise  # Re-raise to ensure proper cleanup
    
    # ... rest of the function
docs/openapi.json (7)

53-53: Tighten wording and casing in description; drop docstring-style “Returns”.

Use consistent “Llama Stack” casing and avoid Python‑docstring phrasing inside OpenAPI descriptions.

- "description": "Handle request to the /info endpoint.\n\nProcess GET requests to the /info endpoint, returning the\nservice name, version and Llama-stack version.\n\nReturns:\n    InfoResponse: An object containing the service's name and version.",
+ "description": "Handles GET /v1/info and returns service name, service version, and Llama Stack version.",

120-122: 500 response lacks JSON content schema; add ErrorResponse and align semantics.

Right now it only has a description. Either keep 503 for upstream dependency failure or, if standardizing on 500, at least include the schema.

-                    "500": {
-                        "description": "Connection to Llama Stack is broken"
-                    }
+                    "500": {
+                        "description": "Internal Server Error: Unable to connect to Llama Stack",
+                        "content": {
+                            "application/json": {
+                                "schema": { "$ref": "#/components/schemas/ErrorResponse" }
+                            }
+                        }
+                    }

Please confirm integration tests were updated to expect 500 with ErrorResponse for /v1/models.


217-229: SSE 200 response: consider removing application/json stub or document handshake.

If 200 always streams SSE, keep only text/event-stream. If a JSON prelude is emitted, document its structure; empty {} schema is misleading.

-                        "content": {
-                            "application/json": {
-                                "schema": {}
-                            },
+                        "content": {
                             "text/event-stream": {
                                 "schema": {
                                     "type": "string",
                                     "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"role\": \"inference\", \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
                                 }
                             }
                         }

If you keep JSON, add a concrete schema (e.g., a minimal handshake object) and document when it’s sent.


230-249: 400 vs 401 semantics: clarify boundary and reuse consistent descriptions.

You map both 400 and 401 to UnauthorizedResponse. If 400 is only for noop/noop-with-token modules (as elsewhere), reflect that in the description and ensure clients can distinguish. Otherwise drop 400 here.

Suggested tweak (descriptions only):

- "description": "Missing or invalid credentials provided by client"
+ "description": "Bad Request: missing/invalid credentials for noop/noop-with-token auth"

519-538: Conversations list: verify 400/401 usage matches auth backends and keep payload schema consistent.

Both map to UnauthorizedResponse, which is fine; ensure error messages are distinct enough for clients, and confirm 503 vs 500 policy across endpoints.


596-615: Conversation GET: same 400/401 semantics consistency check.

Align descriptions with actual behavior and keep UnauthorizedResponse for both or specialize schemas if needed.


674-693: Conversation DELETE: same 400/401 consistency and client‑distinguishable messages.

Also ensure upstream connectivity errors use a consistent status (500 vs 503) across all conversation endpoints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5eb2caa and dcfb068.

📒 Files selected for processing (11)
  • docs/openapi.json (7 hunks)
  • docs/openapi.md (7 hunks)
  • docs/output.md (7 hunks)
  • src/app/endpoints/conversations.py (4 hunks)
  • src/app/endpoints/info.py (1 hunks)
  • src/app/endpoints/models.py (1 hunks)
  • src/app/endpoints/query.py (1 hunks)
  • src/app/endpoints/streaming_query.py (3 hunks)
  • src/auth/README.md (1 hunks)
  • src/services/README.md (1 hunks)
  • tests/integration/test_openapi_json.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/app/endpoints/conversations.py (1)
src/models/responses.py (1)
  • UnauthorizedResponse (361-375)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (2)
  • UnauthorizedResponse (361-375)
  • ForbiddenResponse (378-390)
🪛 markdownlint-cli2 (0.17.2)
docs/output.md

77-77: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

docs/openapi.md

77-77: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: e2e_tests
  • GitHub Check: build-pr
🔇 Additional comments (20)
src/app/endpoints/models.py (1)

49-49: LGTM! Status code alignment for connectivity errors.

The change from 503 to 500 aligns with the PR-wide update to use HTTP 500 for backend connectivity issues. The error payload remains unchanged, maintaining backward compatibility for clients that parse the response body.

src/app/endpoints/query.py (1)

62-62: LGTM! Consistent error code update.

The status code change from 503 to 500 for Llama Stack connectivity errors is consistent with the models endpoint and aligns with the PR-wide standardization.

src/app/endpoints/info.py (1)

50-50: LGTM! Documentation improvement.

The docstring update accurately reflects that the endpoint returns the Llama Stack version in addition to the service name and version. This aligns with the InfoResponse model which already includes the llama_stack_version field.

docs/output.md (4)

37-37: LGTM! Documentation accurately reflects endpoint changes.

The documentation correctly describes the info endpoint's response, including the Llama Stack version.


111-111: LGTM! Status code standardization documented.

The documentation correctly reflects the change from 503 to 500 for connectivity errors.


146-150: LGTM! Comprehensive SSE documentation for streaming endpoint.

The updated documentation clearly describes:

  • The 200 response as SSE with text/event-stream content type
  • New error responses (400, 401, 403, 500) with appropriate models
  • Maintains existing 422 validation error

276-277: LGTM! Consistent error response documentation for conversations endpoints.

All three conversation endpoints now consistently document:

  • 400: Missing or invalid credentials
  • 401: Invalid or missing Bearer token

This aligns with the expanded error surface across the API.

Also applies to: 313-314, 346-347

src/app/endpoints/conversations.py (2)

22-22: LGTM! UnauthorizedResponse import added for error documentation.

The import is appropriately used to document potential authentication errors in the OpenAPI responses.


49-56: Auth middleware raises 400/401 — OpenAPI entries are accurate.
src/authentication/jwk_token.py raises HTTPException(status.HTTP_400_BAD_REQUEST) for token DecodeError/JoseError (≈lines 145–151) and HTTPException(status.HTTP_401_UNAUTHORIZED) for KeyNotFound/BadSignature/ExpiredToken/missing-claim cases (≈lines 134–136, 138–141, 162–169, 177–189).

tests/integration/test_openapi_json.py (2)

62-62: LGTM! Test expectations updated correctly.

The test expectations have been properly updated to reflect:

  • Status code changes from 503 to 500 for connectivity errors
  • Addition of 400/401 credential-related errors where appropriate

Also applies to: 63-63


69-79: LGTM! Comprehensive test coverage for conversations endpoints.

All three conversation endpoints now expect the expanded error surface (400, 401) in addition to their existing error codes. The formatting change improves readability.

docs/openapi.md (2)

37-37: LGTM! Documentation updates align with code changes.

The OpenAPI documentation correctly reflects:

  1. Info endpoint now mentions Llama Stack version
  2. Models and Query endpoints use 500 for connectivity errors (previously 503)

Also applies to: 77-77, 111-111


146-150: LGTM! Comprehensive error response documentation.

The expanded error surface is well-documented:

  • Streaming endpoint: SSE response (200) and comprehensive errors (400/401/403/500)
  • Conversations endpoints: Consistent 400/401 credential errors

Also applies to: 276-277, 313-314, 346-347

src/app/endpoints/streaming_query.py (6)

32-32: LGTM! Appropriate imports for error response models.

The imports of UnauthorizedResponse and ForbiddenResponse are used correctly in the streaming_query_responses dictionary.


56-94: Excellent SSE response documentation!

The streaming_query_responses dictionary provides comprehensive OpenAPI documentation:

  • 200 response properly specifies text/event-stream content type with a clear example
  • Error responses (400, 401, 403, 500) are well-defined with appropriate models
  • The SSE event format example helps API consumers understand the expected stream structure

100-112: Well-structured SSE formatting utility.

The format_stream_data function properly formats SSE events with the required "data: " prefix and double newline suffix.


114-136: Clean SSE event generation functions.

The stream_start_event and stream_end_event functions provide a clear structure for SSE lifecycle events. The end event includes comprehensive metadata about referenced documents and token usage placeholders.

Also applies to: 138-174


177-220: Well-organized event dispatcher with comprehensive handling.

The stream_build_event function uses pattern matching effectively to route different event types to their respective handlers. The catch-all heartbeat ensures the stream stays alive.


563-563: LGTM! Endpoint properly decorated with responses.

The streaming_query endpoint is correctly decorated with the streaming_query_responses dictionary, ensuring proper OpenAPI documentation.

docs/openapi.json (1)

250-258: 403 mapping looks good.

Comment on lines +260 to +266
"500": {
"description": "Internal Server Error",
"detail": {
"response": "Unable to connect to Llama Stack",
"cause": "Connection error."
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Invalid OpenAPI: “detail” at top level in 500 response; move under content.

Same issue as /v1/query. Use ErrorResponse.

-                    "500": {
-                        "description": "Internal Server Error",
-                        "detail": {
-                            "response": "Unable to connect to Llama Stack",
-                            "cause": "Connection error."
-                        }
-                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "content": {
+                            "application/json": {
+                                "schema": { "$ref": "#/components/schemas/ErrorResponse" }
+                            }
+                        }
+                    },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"500": {
"description": "Internal Server Error",
"detail": {
"response": "Unable to connect to Llama Stack",
"cause": "Connection error."
}
},
"500": {
"description": "Internal Server Error",
"content": {
"application/json": {
"schema": { "$ref": "#/components/schemas/ErrorResponse" }
}
}
},
🤖 Prompt for AI Agents
In docs/openapi.json around lines 260 to 266, the 500 response currently
contains a top-level "detail" object which is invalid OpenAPI; move that error
payload under the response's "content" (e.g., application/json) and reference or
use the existing ErrorResponse schema instead of an inline "detail" object so
the response body is properly described (replace the inline detail with a
content block that references ErrorResponse).

@radofuchs radofuchs force-pushed the LCORE_432_update_doc_endpoints branch from bdfe3cd to 911945d Compare September 18, 2025 11:53
Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)

563-570: Set SSE content-type and disable buffering on StreamingResponse.

File: src/app/endpoints/streaming_query.py:713 — return currently lacks SSE media type and no-buffering headers.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )
🧹 Nitpick comments (12)
docs/openapi.md (6)

37-37: Capitalize and punctuate consistently.

Prefer “service name, version, and Llama Stack version.”

- service name, version and Llama-stack version.
+ service name, version, and Llama Stack version.

77-77: Tighten 500 description; add surrounding blank lines for MD058.

Phrase as “Connection to Llama Stack failed” and ensure a blank line before/after the table to satisfy markdownlint MD058.

-| 500 | Connection to Llama Stack is broken |  |
+| 500 | Connection to Llama Stack failed |  |

146-150: SSE docs polish and consistent error component.

  • Replace placeholder “…string” with “string”; call out content type.
  • Add ErrorResponse component for 500 like other endpoints.
-| 200 | Streaming response with Server-Sent Events | ...string |
+| 200 | Streaming response with Server-Sent Events (text/event-stream) | string |
...
-| 500 | Internal Server Error |  |
+| 500 | Internal Server Error | [ErrorResponse](#errorresponse) |

276-277: Clarify 400 vs 401 semantics.

If 400 is specific to noop/noop-with-token auth modules, say so; otherwise consider 401 for auth failures.

-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |

313-314: Mirror the 400 wording here for consistency.

-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |

346-347: Mirror the 400 wording here as well.

-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
docs/output.md (6)

37-37: Capitalize and punctuate consistently.

Use “service name, version, and Llama Stack version.”

- service name, version and Llama-stack version.
+ service name, version, and Llama Stack version.

77-77: Refine 500 wording; fix MD058 blanks-around-tables.

Use “Connection to Llama Stack failed” and add a blank line above/below the table.

-| 500 | Connection to Llama Stack is broken |  |
+| 500 | Connection to Llama Stack failed |  |

146-150: SSE section: fix component and 500 schema reference.

  • Replace “…string” with “string”; call out text/event-stream.
  • Add ErrorResponse to 500.
-| 200 | Streaming response with Server-Sent Events | ...string |
+| 200 | Streaming response with Server-Sent Events (text/event-stream) | string |
...
-| 500 | Internal Server Error |  |
+| 500 | Internal Server Error | [ErrorResponse](#errorresponse) |

276-277: Clarify 400 vs 401 semantics (noop/noop-with-token).

-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |

313-314: Same clarification here.

-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |

346-347: Same clarification for DELETE endpoint.

-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcfb068 and 911945d.

📒 Files selected for processing (9)
  • docs/openapi.json (7 hunks)
  • docs/openapi.md (7 hunks)
  • docs/output.md (7 hunks)
  • src/app/endpoints/conversations.py (4 hunks)
  • src/app/endpoints/info.py (1 hunks)
  • src/app/endpoints/models.py (1 hunks)
  • src/app/endpoints/query.py (1 hunks)
  • src/app/endpoints/streaming_query.py (3 hunks)
  • tests/integration/test_openapi_json.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/openapi.json
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/integration/test_openapi_json.py
  • src/app/endpoints/query.py
  • src/app/endpoints/models.py
  • src/app/endpoints/conversations.py
  • src/app/endpoints/info.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (2)
  • UnauthorizedResponse (361-375)
  • ForbiddenResponse (378-390)
🪛 markdownlint-cli2 (0.17.2)
docs/openapi.md

77-77: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-pr
  • GitHub Check: e2e_tests
🔇 Additional comments (3)
src/app/endpoints/streaming_query.py (1)

32-32: Imports for auth error models look correct.

docs/openapi.md (1)

111-111: 503→500 change acknowledged — openapi.json and tests verified.

Verified: /v1/query in docs/openapi.json contains a 500 response and tests/integration/test_openapi_json.py expects 500; health tests still assert 503 in tests/unit/app/endpoints/test_health.py.

docs/output.md (1)

111-111: 503→500 acknowledged — OpenAPI JSON & tests in sync.
Verified docs/output.md, docs/openapi.json and tests/integration/test_openapi_json.py show 500 for /v1/query; other endpoints that should be 503 remain unchanged.

Comment on lines +56 to +94
streaming_query_responses: dict[int | str, dict[str, Any]] = {
200: {
"description": "Streaming response with Server-Sent Events",
"content": {
"text/event-stream": {
"schema": {
"type": "string",
"example": (
'data: {"event": "start", '
'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n'
'data: {"event": "token", "data": {"id": 0, "role": "inference", '
'"token": "Hello"}}\n\n'
'data: {"event": "end", "data": {"referenced_documents": [], '
'"truncated": null, "input_tokens": 0, "output_tokens": 0}, '
'"available_quotas": {}}\n\n'
),
}
}
},
},
400: {
"description": "Missing or invalid credentials provided by client",
"model": UnauthorizedResponse,
},
401: {
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth",
"model": UnauthorizedResponse,
},
403: {
"description": "User is not authorized",
"model": ForbiddenResponse,
},
500: {
"detail": {
"response": "Unable to connect to Llama Stack",
"cause": "Connection error.",
}
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix FastAPI responses spec: 500 block uses unsupported “detail” key.

FastAPI expects description/model/content. Use description + JSON example.

-        "detail": {
-            "response": "Unable to connect to Llama Stack",
-            "cause": "Connection error.",
-        }
+        "description": "Internal Server Error",
+        "content": {
+            "application/json": {
+                "example": {
+                    "response": "Unable to connect to Llama Stack",
+                    "cause": "Connection error."
+                }
+            }
+        },
     },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
streaming_query_responses: dict[int | str, dict[str, Any]] = {
200: {
"description": "Streaming response with Server-Sent Events",
"content": {
"text/event-stream": {
"schema": {
"type": "string",
"example": (
'data: {"event": "start", '
'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n'
'data: {"event": "token", "data": {"id": 0, "role": "inference", '
'"token": "Hello"}}\n\n'
'data: {"event": "end", "data": {"referenced_documents": [], '
'"truncated": null, "input_tokens": 0, "output_tokens": 0}, '
'"available_quotas": {}}\n\n'
),
}
}
},
},
400: {
"description": "Missing or invalid credentials provided by client",
"model": UnauthorizedResponse,
},
401: {
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth",
"model": UnauthorizedResponse,
},
403: {
"description": "User is not authorized",
"model": ForbiddenResponse,
},
500: {
"detail": {
"response": "Unable to connect to Llama Stack",
"cause": "Connection error.",
}
},
}
streaming_query_responses: dict[int | str, dict[str, Any]] = {
200: {
"description": "Streaming response with Server-Sent Events",
"content": {
"text/event-stream": {
"schema": {
"type": "string",
"example": (
'data: {"event": "start", '
'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n'
'data: {"event": "token", "data": {"id": 0, "role": "inference", '
'"token": "Hello"}}\n\n'
'data: {"event": "end", "data": {"referenced_documents": [], '
'"truncated": null, "input_tokens": 0, "output_tokens": 0}, '
'"available_quotas": {}}\n\n'
),
}
}
},
},
400: {
"description": "Missing or invalid credentials provided by client",
"model": UnauthorizedResponse,
},
401: {
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth",
"model": UnauthorizedResponse,
},
403: {
"description": "User is not authorized",
"model": ForbiddenResponse,
},
500: {
"description": "Internal Server Error",
"content": {
"application/json": {
"example": {
"response": "Unable to connect to Llama Stack",
"cause": "Connection error."
}
}
},
},
}
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 56 to 94, the 500 response
entry uses an unsupported "detail" key; replace it with a valid FastAPI response
spec (use "description" and either "model" or "content") and provide a JSON
example under "content" (e.g., "application/json": {"schema":
{"type":"object","example": {"response":"Unable to connect to Llama
Stack","cause":"Connection error."}}}) so the OpenAPI schema is valid and
clients see the example.

⚠️ Potential issue

🧩 Analysis chain

Avoid logging secrets from configuration.

Line 598 logs the full Llama Stack configuration (likely includes api_key). Remove or redact.

-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    # Avoid leaking secrets (api_key).
+    cfg = configuration.llama_stack_configuration
+    logger.info(
+        "Llama stack config: url=%s, use_as_library_client=%s",
+        getattr(cfg, "url", "<redacted>"),
+        getattr(cfg, "use_as_library_client", "<redacted>"),
+    )

🏁 Script executed:

#!/bin/bash
rg -nP 'logger\.(info|debug).*(llama stack config|api_key)' -C2 src

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "1) Searching for direct references to Llama config..."
rg -n --hidden -S "llama_stack_configuration|Llama stack|llama stack|LlamaStack" -C3 || true

echo "2) Searching for configuration.llama* usages..."
rg -n --hidden -S "configuration\.llama" -C3 || true

echo "3) Searching for logger calls (inspect surrounding context)..."
rg -n --hidden -S "logger\.(info|debug|warn|warning|error)" -C3 || true

echo "4) Searching for potential secret keys (api_key variations)..."
rg -n --hidden -S "api_key|API_KEY|apikey|apiKey" -C3 || true

Length of output: 214306


Avoid logging secrets — redact Llama Stack configuration (api_key)

Logging the full LlamaStack configuration exposes secrets (api_key/SecretStr). Remove or redact these logger statements.

  • Fix at least: src/app/endpoints/streaming_query.py:598, src/app/endpoints/query.py:181, src/app/endpoints/models.py:81, src/lightspeed_stack.py:62–65, src/configuration.py:52.
-    logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+    # Avoid leaking secrets (api_key).
+    cfg = configuration.llama_stack_configuration
+    logger.info(
+        "Llama stack config: url=%s, use_as_library_client=%s",
+        getattr(cfg, "url", "<redacted>"),
+        getattr(cfg, "use_as_library_client", "<redacted>"),
+    )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py (lines ~56–94) and specifically
remove/redact secrets at the logger calls indicated in the review:
src/app/endpoints/streaming_query.py:598, src/app/endpoints/query.py:181,
src/app/endpoints/models.py:81, src/lightspeed_stack.py:62–65,
src/configuration.py:52 — the issue is that Llama Stack configuration (api_key /
SecretStr) is being logged; change those logger statements to never print raw
secret values by either removing the log, logging only the presence/absence of a
key (e.g., "api_key: set" or "api_key: not set"), or log a sanitized
representation (mask the key like "***REDACTED***" or use
config.dict(exclude={"api_key"})/exclude_unset to omit secrets); ensure any
SecretStr uses get_secret_value() only when necessary and never passed to
logger, and update all affected lines to use the sanitized/omitted form.

@tisnik tisnik merged commit d0d0393 into lightspeed-core:main Sep 18, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants