LCORE-432: update doc for endpoints #560

radofuchs · 2025-09-18T10:56:03Z

Description

Type of change

Related Tickets & Documents

Related Issue #LCORE-432
Closes #LCORE-432

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Streaming endpoint now supports Server-Sent Events with example payloads.
- Info now includes Llama Stack version.
- Query requests support a new media_type field.
Bug Fixes
- Standardized connectivity errors from 503 to 500 across info/models/query.
- Added explicit 400/401/403 authentication/authorization errors to streaming and conversations endpoints.
Documentation
- OpenAPI and docs updated for SSE responses, new fields, and revised error codes.
- Feedback example updated to negative sentiment with categories.
Tests
- OpenAPI spec tests updated to reflect new response codes and endpoints.

coderabbitai · 2025-09-18T10:56:11Z

Walkthrough

This PR updates OpenAPI docs and server endpoint response schemas, adds SSE documentation and handlers for the streaming query, aligns connectivity errors to 500, and introduces credential/authorization error responses across conversation endpoints. Tests are updated to reflect new response codes. InfoResponse gains llama_stack_version; docs are synced.

Changes

Cohort / File(s)	Summary
OpenAPI and Docs `docs/openapi.json`, `docs/openapi.md`, `docs/output.md`	Syncs API docs: adds llama_stack_version to InfoResponse; documents SSE for POST /v1/streaming_query; adds 400/401/403/500 responses; maps 503→500 for models/query/info; updates examples and wording.
Streaming SSE Implementation `src/app/endpoints/streaming_query.py`	Introduces SSE utilities and streaming lifecycle; adds `streaming_query_responses` with SSE example; wires 400/401 (UnauthorizedResponse), 403 (ForbiddenResponse), 500; updates handler to emit start/token/end and collect metrics.
Conversations Responses `src/app/endpoints/conversations.py`	Imports UnauthorizedResponse; adds 400/401 to list/get/delete conversations response maps; no control-flow changes.
Models/Query/Info Responses `src/app/endpoints/models.py`, `src/app/endpoints/query.py`, `src/app/endpoints/info.py`	Maps connectivity errors to 500 (models/query); updates info docstring to mention Llama-stack version; no runtime logic changes.
OpenAPI Tests `tests/integration/test_openapi_json.py`	Adjusts expected response codes for affected endpoints (adds 400/401, replaces 503 with 500, updates streaming/query expectations).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant API as /v1/streaming_query
  participant LS as Llama Stack
  participant SSE as SSE Stream

  Note over API,SSE: New SSE flow (text/event-stream)
  C->>API: POST /v1/streaming_query (request)
  API->>SSE: open stream
  API->>SSE: event: start
  API->>LS: initiate streaming inference
  loop chunks
    LS-->>API: token/tool/heartbeat chunk
    API->>SSE: event: token / tool / heartbeat
  end
  LS-->>API: done
  API->>SSE: event: end (usage, transcript, metadata)
  SSE-->>C: streamed events
  alt Auth or permission error
    API-->>C: 400/401/403 (JSON)
  else Connection/internal error
    API-->>C: 500 (detail)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Add tool calls to stored transcripts #404: Also modifies streaming_query endpoint, including streaming handling and related types.
LCORE-466: Llama Stack version in /info endpoint #469: Updates /info and InfoResponse to include llama_stack_version.
Regenerated OpenAPI spec + doc #451: Adjusts OpenAPI specs, including QueryRequest.media_type and related schema updates.

Suggested reviewers

tisnik
umago
manstis
maorfr

Poem

Hoppity hop, I stream with glee,
Tokens drift by like leaves from a tree.
Start, token, end—an SSE tune,
Credentials checked, errors in tune.
Models now five-hundred when stacks won’t play,
Llama-stack version lights the way.
Thump-thump—ship it today! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "LCORE-432: update doc for endpoints" is related to the changeset because substantial documentation/OpenAPI and test updates are present, but it understates the scope: the PR also introduces public API signature changes (new fields on InfoResponse/QueryRequest) and a new SSE streaming endpoint in source code, so the title is accurate but not fully descriptive.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tisnik

LGTM in overall, you just have two files there that should not be there

src/auth/README.md

src/services/README.md

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/openapi.json (1)

177-183: Invalid OpenAPI: “detail” at the top level of a response is not allowed.

Move the payload under content with a schema (e.g., ErrorResponse). As written, this won’t validate.

-                    "500": {
-                        "description": "Internal Server Error",
-                        "detail": {
-                            "response": "Unable to connect to Llama Stack",
-                            "cause": "Connection error."
-                        }
-                    },
+                    "500": {
+                        "description": "Internal Server Error",
+                        "content": {
+                            "application/json": {
+                                "schema": { "$ref": "#/components/schemas/ErrorResponse" }
+                            }
+                        }
+                    },

🧹 Nitpick comments (12)

src/auth/README.md (1)
1-2: Expand README to convey module purpose and usage.

A single header doesn’t help future contributors. Consider adding: overview, responsibilities, security model, env vars, key files, and how to add a new auth provider.

Apply this minimal scaffold:
-# List of source files stored in `src/auth` directory
+# Auth module
+
+Purpose: authentication/authorization utilities and integrations.
+
+Includes:
+- Providers: OAuth/OIDC, API key, service tokens
+- Middlewares/guards
+- Token/signing utilities
+
+Conventions:
+- All providers implement a common interface
+- No network calls in pure helpers
+
+Env vars:
+- AUTH_ISSUER, AUTH_AUDIENCE, AUTH_PUBLIC_KEYS_URL (example)
+
+Contributing:
+- Adding a provider: steps, tests, docs links
+
+Ownership: <team/Slack channel>
src/services/README.md (1)
1-2: Document service layer structure and conventions.

Add brief guidance so newcomers know how to navigate and extend services.

Suggested skeleton:
-# List of source files stored in `src/services` directory
+# Services layer
+
+Purpose: business logic orchestrating repositories, clients, and auth.
+
+Layout:
+- <service_name>Service.* — stateless, dependency-injected
+- Errors: use domain-specific error types
+
+Conventions:
+- Timeouts/retries handled in clients; services remain pure
+- No I/O in pure helpers; side effects isolated
+
+Testing:
+- Unit tests with mocked clients
+- Integration tests for critical flows
+
+Adding a service:
+- Interface, implementation, wiring, tests, docs
+
+Ownership: <team/Slack channel>
docs/output.md (1)
77-77: Table formatting needs blank lines.

Add blank lines before and after the table for proper Markdown formatting:
 | Status Code | Description | Component |
 |-------------|-------------|-----------|
 | 200 | Successful Response | [ModelsResponse](#modelsresponse) |
 | 500 | Connection to Llama Stack is broken |  |
+
 ## POST `/v1/query`
docs/openapi.md (1)
77-77: Table formatting needs blank lines.

Add blank lines around the table for proper Markdown formatting:
 | Status Code | Description | Component |
 |-------------|-------------|-----------|
 | 200 | Successful Response | [ModelsResponse](#modelsresponse) |
 | 500 | Connection to Llama Stack is broken |  |
+
 ## POST `/v1/query`
src/app/endpoints/streaming_query.py (1)
640-683: Verify error handling in the streaming generator.

The response_generator function handles the SSE stream well, but consider what happens if an exception occurs during streaming after the response has started. The client would receive a partial stream without a proper error event.

Consider adding error handling within the generator to emit an error event:
async def response_generator(
    turn_response: AsyncIterator[AgentTurnResponseStreamChunk],
) -> AsyncIterator[str]:
    chunk_id = 0
    summary = TurnSummary(
        llm_response="No response from the model", tool_calls=[]
    )
    
    # Send start event
    yield stream_start_event(conversation_id)
    
    try:
        async for chunk in turn_response:
            # ... existing chunk processing ...
            
        yield stream_end_event(metadata_map)
    except Exception as e:
        # Emit error event to the stream
        logger.exception("Error during streaming: %s", str(e))
        yield format_stream_data({
            "event": "error",
            "data": {
                "id": chunk_id,
                "message": "Stream interrupted due to an error"
            }
        })
        raise  # Re-raise to ensure proper cleanup
    
    # ... rest of the function
docs/openapi.json (7)
53-53: Tighten wording and casing in description; drop docstring-style “Returns”.

Use consistent “Llama Stack” casing and avoid Python‑docstring phrasing inside OpenAPI descriptions.
- "description": "Handle request to the /info endpoint.\n\nProcess GET requests to the /info endpoint, returning the\nservice name, version and Llama-stack version.\n\nReturns:\n    InfoResponse: An object containing the service's name and version.",
+ "description": "Handles GET /v1/info and returns service name, service version, and Llama Stack version.",
120-122: 500 response lacks JSON content schema; add ErrorResponse and align semantics.

Right now it only has a description. Either keep 503 for upstream dependency failure or, if standardizing on 500, at least include the schema.
-                    "500": {
-                        "description": "Connection to Llama Stack is broken"
-                    }
+                    "500": {
+                        "description": "Internal Server Error: Unable to connect to Llama Stack",
+                        "content": {
+                            "application/json": {
+                                "schema": { "$ref": "#/components/schemas/ErrorResponse" }
+                            }
+                        }
+                    }
Please confirm integration tests were updated to expect 500 with ErrorResponse for /v1/models.

217-229: SSE 200 response: consider removing application/json stub or document handshake.

If 200 always streams SSE, keep only text/event-stream. If a JSON prelude is emitted, document its structure; empty {} schema is misleading.
-                        "content": {
-                            "application/json": {
-                                "schema": {}
-                            },
+                        "content": {
                             "text/event-stream": {
                                 "schema": {
                                     "type": "string",
                                     "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"role\": \"inference\", \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n"
                                 }
                             }
                         }
If you keep JSON, add a concrete schema (e.g., a minimal handshake object) and document when it’s sent.

230-249: 400 vs 401 semantics: clarify boundary and reuse consistent descriptions.

You map both 400 and 401 to UnauthorizedResponse. If 400 is only for noop/noop-with-token modules (as elsewhere), reflect that in the description and ensure clients can distinguish. Otherwise drop 400 here.

Suggested tweak (descriptions only):
- "description": "Missing or invalid credentials provided by client"
+ "description": "Bad Request: missing/invalid credentials for noop/noop-with-token auth"
519-538: Conversations list: verify 400/401 usage matches auth backends and keep payload schema consistent.

Both map to UnauthorizedResponse, which is fine; ensure error messages are distinct enough for clients, and confirm 503 vs 500 policy across endpoints.

596-615: Conversation GET: same 400/401 semantics consistency check.

Align descriptions with actual behavior and keep UnauthorizedResponse for both or specialize schemas if needed.

674-693: Conversation DELETE: same 400/401 consistency and client‑distinguishable messages.

Also ensure upstream connectivity errors use a consistent status (500 vs 503) across all conversation endpoints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5eb2caa and dcfb068.

📒 Files selected for processing (11)

docs/openapi.json (7 hunks)
docs/openapi.md (7 hunks)
docs/output.md (7 hunks)
src/app/endpoints/conversations.py (4 hunks)
src/app/endpoints/info.py (1 hunks)
src/app/endpoints/models.py (1 hunks)
src/app/endpoints/query.py (1 hunks)
src/app/endpoints/streaming_query.py (3 hunks)
src/auth/README.md (1 hunks)
src/services/README.md (1 hunks)
tests/integration/test_openapi_json.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

src/app/endpoints/conversations.py (1)

src/models/responses.py (1)

UnauthorizedResponse (361-375)

src/app/endpoints/streaming_query.py (1)

src/models/responses.py (2)

UnauthorizedResponse (361-375)

ForbiddenResponse (378-390)

🪛 markdownlint-cli2 (0.17.2)

docs/output.md

77-77: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

docs/openapi.md

77-77: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: e2e_tests
GitHub Check: build-pr

🔇 Additional comments (20)

src/app/endpoints/models.py (1)

49-49: LGTM! Status code alignment for connectivity errors.

The change from 503 to 500 aligns with the PR-wide update to use HTTP 500 for backend connectivity issues. The error payload remains unchanged, maintaining backward compatibility for clients that parse the response body.

src/app/endpoints/query.py (1)

62-62: LGTM! Consistent error code update.

The status code change from 503 to 500 for Llama Stack connectivity errors is consistent with the models endpoint and aligns with the PR-wide standardization.

src/app/endpoints/info.py (1)

50-50: LGTM! Documentation improvement.

The docstring update accurately reflects that the endpoint returns the Llama Stack version in addition to the service name and version. This aligns with the InfoResponse model which already includes the llama_stack_version field.

docs/output.md (4)

37-37: LGTM! Documentation accurately reflects endpoint changes.

The documentation correctly describes the info endpoint's response, including the Llama Stack version.

111-111: LGTM! Status code standardization documented.

The documentation correctly reflects the change from 503 to 500 for connectivity errors.

146-150: LGTM! Comprehensive SSE documentation for streaming endpoint.

The updated documentation clearly describes:

The 200 response as SSE with text/event-stream content type

New error responses (400, 401, 403, 500) with appropriate models

Maintains existing 422 validation error

276-277: LGTM! Consistent error response documentation for conversations endpoints.

All three conversation endpoints now consistently document:

400: Missing or invalid credentials

401: Invalid or missing Bearer token

This aligns with the expanded error surface across the API.

Also applies to: 313-314, 346-347

src/app/endpoints/conversations.py (2)

22-22: LGTM! UnauthorizedResponse import added for error documentation.

The import is appropriately used to document potential authentication errors in the OpenAPI responses.

49-56: Auth middleware raises 400/401 — OpenAPI entries are accurate.
src/authentication/jwk_token.py raises HTTPException(status.HTTP_400_BAD_REQUEST) for token DecodeError/JoseError (≈lines 145–151) and HTTPException(status.HTTP_401_UNAUTHORIZED) for KeyNotFound/BadSignature/ExpiredToken/missing-claim cases (≈lines 134–136, 138–141, 162–169, 177–189).

tests/integration/test_openapi_json.py (2)

62-62: LGTM! Test expectations updated correctly.

The test expectations have been properly updated to reflect:

Status code changes from 503 to 500 for connectivity errors

Addition of 400/401 credential-related errors where appropriate

Also applies to: 63-63

69-79: LGTM! Comprehensive test coverage for conversations endpoints.

All three conversation endpoints now expect the expanded error surface (400, 401) in addition to their existing error codes. The formatting change improves readability.

docs/openapi.md (2)

37-37: LGTM! Documentation updates align with code changes.

The OpenAPI documentation correctly reflects:

Info endpoint now mentions Llama Stack version

Models and Query endpoints use 500 for connectivity errors (previously 503)

Also applies to: 77-77, 111-111

146-150: LGTM! Comprehensive error response documentation.

The expanded error surface is well-documented:

Streaming endpoint: SSE response (200) and comprehensive errors (400/401/403/500)

Conversations endpoints: Consistent 400/401 credential errors

Also applies to: 276-277, 313-314, 346-347

src/app/endpoints/streaming_query.py (6)

32-32: LGTM! Appropriate imports for error response models.

The imports of UnauthorizedResponse and ForbiddenResponse are used correctly in the streaming_query_responses dictionary.

56-94: Excellent SSE response documentation!

The streaming_query_responses dictionary provides comprehensive OpenAPI documentation:

200 response properly specifies text/event-stream content type with a clear example

Error responses (400, 401, 403, 500) are well-defined with appropriate models

The SSE event format example helps API consumers understand the expected stream structure

100-112: Well-structured SSE formatting utility.

The format_stream_data function properly formats SSE events with the required "data: " prefix and double newline suffix.

114-136: Clean SSE event generation functions.

The stream_start_event and stream_end_event functions provide a clear structure for SSE lifecycle events. The end event includes comprehensive metadata about referenced documents and token usage placeholders.

Also applies to: 138-174

177-220: Well-organized event dispatcher with comprehensive handling.

The stream_build_event function uses pattern matching effectively to route different event types to their respective handlers. The catch-all heartbeat ensures the stream stays alive.

563-563: LGTM! Endpoint properly decorated with responses.

The streaming_query endpoint is correctly decorated with the streaming_query_responses dictionary, ensuring proper OpenAPI documentation.

docs/openapi.json (1)

250-258: 403 mapping looks good.

coderabbitai · 2025-09-18T11:05:29Z

docs/openapi.json

+                    "500": {
+                        "description": "Internal Server Error",
+                        "detail": {
+                            "response": "Unable to connect to Llama Stack",
+                            "cause": "Connection error."
+                        }
+                    },


⚠️ Potential issue

Invalid OpenAPI: “detail” at top level in 500 response; move under content.

Same issue as /v1/query. Use ErrorResponse.

- "500": { - "description": "Internal Server Error", - "detail": { - "response": "Unable to connect to Llama Stack", - "cause": "Connection error." - } - }, + "500": { + "description": "Internal Server Error", + "content": { + "application/json": { + "schema": { "$ref": "#/components/schemas/ErrorResponse" } + } + } + },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"500": {

"description": "Internal Server Error",

"detail": {

"response": "Unable to connect to Llama Stack",

"cause": "Connection error."

}

},

"500": {

"description": "Internal Server Error",

"content": {

"application/json": {

"schema": { "$ref": "#/components/schemas/ErrorResponse" }

}

}

},

🤖 Prompt for AI Agents

In docs/openapi.json around lines 260 to 266, the 500 response currently contains a top-level "detail" object which is invalid OpenAPI; move that error payload under the response's "content" (e.g., application/json) and reference or use the existing ErrorResponse schema instead of an inline "detail" object so the response body is properly described (replace the inline detail with a content block that references ErrorResponse).

tisnik

LGTM

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/app/endpoints/streaming_query.py (1)

563-570: Set SSE content-type and disable buffering on StreamingResponse.

File: src/app/endpoints/streaming_query.py:713 — return currently lacks SSE media type and no-buffering headers.

-        return StreamingResponse(response_generator(response))
+        return StreamingResponse(
+            response_generator(response),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )

🧹 Nitpick comments (12)

docs/openapi.md (6)
37-37: Capitalize and punctuate consistently.

Prefer “service name, version, and Llama Stack version.”
- service name, version and Llama-stack version.
+ service name, version, and Llama Stack version.
77-77: Tighten 500 description; add surrounding blank lines for MD058.

Phrase as “Connection to Llama Stack failed” and ensure a blank line before/after the table to satisfy markdownlint MD058.
-| 500 | Connection to Llama Stack is broken |  |
+| 500 | Connection to Llama Stack failed |  |
146-150: SSE docs polish and consistent error component.

Replace placeholder “…string” with “string”; call out content type.

Add ErrorResponse component for 500 like other endpoints.
-| 200 | Streaming response with Server-Sent Events | ...string |
+| 200 | Streaming response with Server-Sent Events (text/event-stream) | string |
...
-| 500 | Internal Server Error |  |
+| 500 | Internal Server Error | [ErrorResponse](#errorresponse) |
276-277: Clarify 400 vs 401 semantics.

If 400 is specific to noop/noop-with-token auth modules, say so; otherwise consider 401 for auth failures.
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
313-314: Mirror the 400 wording here for consistency.
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
346-347: Mirror the 400 wording here as well.
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
docs/output.md (6)
37-37: Capitalize and punctuate consistently.

Use “service name, version, and Llama Stack version.”
- service name, version and Llama-stack version.
+ service name, version, and Llama Stack version.
77-77: Refine 500 wording; fix MD058 blanks-around-tables.

Use “Connection to Llama Stack failed” and add a blank line above/below the table.
-| 500 | Connection to Llama Stack is broken |  |
+| 500 | Connection to Llama Stack failed |  |
146-150: SSE section: fix component and 500 schema reference.

Replace “…string” with “string”; call out text/event-stream.

Add ErrorResponse to 500.
-| 200 | Streaming response with Server-Sent Events | ...string |
+| 200 | Streaming response with Server-Sent Events (text/event-stream) | string |
...
-| 500 | Internal Server Error |  |
+| 500 | Internal Server Error | [ErrorResponse](#errorresponse) |
276-277: Clarify 400 vs 401 semantics (noop/noop-with-token).
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
313-314: Same clarification here.
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
346-347: Same clarification for DELETE endpoint.
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) |
+| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcfb068 and 911945d.

📒 Files selected for processing (9)

docs/openapi.json (7 hunks)
docs/openapi.md (7 hunks)
docs/output.md (7 hunks)
src/app/endpoints/conversations.py (4 hunks)
src/app/endpoints/info.py (1 hunks)
src/app/endpoints/models.py (1 hunks)
src/app/endpoints/query.py (1 hunks)
src/app/endpoints/streaming_query.py (3 hunks)
tests/integration/test_openapi_json.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/openapi.json

🚧 Files skipped from review as they are similar to previous changes (5)

tests/integration/test_openapi_json.py
src/app/endpoints/query.py
src/app/endpoints/models.py
src/app/endpoints/conversations.py
src/app/endpoints/info.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/app/endpoints/streaming_query.py (1)

src/models/responses.py (2)

UnauthorizedResponse (361-375)

ForbiddenResponse (378-390)

🪛 markdownlint-cli2 (0.17.2)

docs/openapi.md

77-77: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-pr
GitHub Check: e2e_tests

🔇 Additional comments (3)

src/app/endpoints/streaming_query.py (1)

32-32: Imports for auth error models look correct.

docs/openapi.md (1)

111-111: 503→500 change acknowledged — openapi.json and tests verified.

Verified: /v1/query in docs/openapi.json contains a 500 response and tests/integration/test_openapi_json.py expects 500; health tests still assert 503 in tests/unit/app/endpoints/test_health.py.

docs/output.md (1)

111-111: 503→500 acknowledged — OpenAPI JSON & tests in sync.
Verified docs/output.md, docs/openapi.json and tests/integration/test_openapi_json.py show 500 for /v1/query; other endpoints that should be 503 remain unchanged.

coderabbitai · 2025-09-18T12:07:04Z

src/app/endpoints/streaming_query.py

+streaming_query_responses: dict[int | str, dict[str, Any]] = {
+    200: {
+        "description": "Streaming response with Server-Sent Events",
+        "content": {
+            "text/event-stream": {
+                "schema": {
+                    "type": "string",
+                    "example": (
+                        'data: {"event": "start", '
+                        '"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n'
+                        'data: {"event": "token", "data": {"id": 0, "role": "inference", '
+                        '"token": "Hello"}}\n\n'
+                        'data: {"event": "end", "data": {"referenced_documents": [], '
+                        '"truncated": null, "input_tokens": 0, "output_tokens": 0}, '
+                        '"available_quotas": {}}\n\n'
+                    ),
+                }
+            }
+        },
+    },
+    400: {
+        "description": "Missing or invalid credentials provided by client",
+        "model": UnauthorizedResponse,
+    },
+    401: {
+        "description": "Unauthorized: Invalid or missing Bearer token for k8s auth",
+        "model": UnauthorizedResponse,
+    },
+    403: {
+        "description": "User is not authorized",
+        "model": ForbiddenResponse,
+    },
+    500: {
+        "detail": {
+            "response": "Unable to connect to Llama Stack",
+            "cause": "Connection error.",
+        }
+    },
+}


⚠️ Potential issue

Fix FastAPI responses spec: 500 block uses unsupported “detail” key.

FastAPI expects description/model/content. Use description + JSON example.

- "detail": { - "response": "Unable to connect to Llama Stack", - "cause": "Connection error.", - } + "description": "Internal Server Error", + "content": { + "application/json": { + "example": { + "response": "Unable to connect to Llama Stack", + "cause": "Connection error." + } + } + }, },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

streaming_query_responses: dict[int | str, dict[str, Any]] = {

200: {

"description": "Streaming response with Server-Sent Events",

"content": {

"text/event-stream": {

"schema": {

"type": "string",

"example": (

'data: {"event": "start", '

'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n'

'data: {"event": "token", "data": {"id": 0, "role": "inference", '

'"token": "Hello"}}\n\n'

'data: {"event": "end", "data": {"referenced_documents": [], '

'"truncated": null, "input_tokens": 0, "output_tokens": 0}, '

'"available_quotas": {}}\n\n'

),

}

}

},

},

400: {

"description": "Missing or invalid credentials provided by client",

"model": UnauthorizedResponse,

},

401: {

"description": "Unauthorized: Invalid or missing Bearer token for k8s auth",

"model": UnauthorizedResponse,

},

403: {

"description": "User is not authorized",

"model": ForbiddenResponse,

},

500: {

"detail": {

"response": "Unable to connect to Llama Stack",

"cause": "Connection error.",

}

},

}

streaming_query_responses: dict[int | str, dict[str, Any]] = {

200: {

"description": "Streaming response with Server-Sent Events",

"content": {

"text/event-stream": {

"schema": {

"type": "string",

"example": (

'data: {"event": "start", '

'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n'

'data: {"event": "token", "data": {"id": 0, "role": "inference", '

'"token": "Hello"}}\n\n'

'data: {"event": "end", "data": {"referenced_documents": [], '

'"truncated": null, "input_tokens": 0, "output_tokens": 0}, '

'"available_quotas": {}}\n\n'

),

}

}

},

},

400: {

"description": "Missing or invalid credentials provided by client",

"model": UnauthorizedResponse,

},

401: {

"description": "Unauthorized: Invalid or missing Bearer token for k8s auth",

"model": UnauthorizedResponse,

},

403: {

"description": "User is not authorized",

"model": ForbiddenResponse,

},

500: {

"description": "Internal Server Error",

"content": {

"application/json": {

"example": {

"response": "Unable to connect to Llama Stack",

"cause": "Connection error."

}

}

},

},

}

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query.py around lines 56 to 94, the 500 response entry uses an unsupported "detail" key; replace it with a valid FastAPI response spec (use "description" and either "model" or "content") and provide a JSON example under "content" (e.g., "application/json": {"schema": {"type":"object","example": {"response":"Unable to connect to Llama Stack","cause":"Connection error."}}}) so the OpenAPI schema is valid and clients see the example.

⚠️ Potential issue

🧩 Analysis chain

Avoid logging secrets from configuration.

Line 598 logs the full Llama Stack configuration (likely includes api_key). Remove or redact.

- logger.info("Llama stack config: %s", configuration.llama_stack_configuration) + # Avoid leaking secrets (api_key). + cfg = configuration.llama_stack_configuration + logger.info( + "Llama stack config: url=%s, use_as_library_client=%s", + getattr(cfg, "url", "<redacted>"), + getattr(cfg, "use_as_library_client", "<redacted>"), + )

🏁 Script executed:

#!/bin/bash rg -nP 'logger\.(info|debug).*(llama stack config|api_key)' -C2 src

Length of output: 0

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "1) Searching for direct references to Llama config..." rg -n --hidden -S "llama_stack_configuration|Llama stack|llama stack|LlamaStack" -C3 || true echo "2) Searching for configuration.llama* usages..." rg -n --hidden -S "configuration\.llama" -C3 || true echo "3) Searching for logger calls (inspect surrounding context)..." rg -n --hidden -S "logger\.(info|debug|warn|warning|error)" -C3 || true echo "4) Searching for potential secret keys (api_key variations)..." rg -n --hidden -S "api_key|API_KEY|apikey|apiKey" -C3 || true

Length of output: 214306

Avoid logging secrets — redact Llama Stack configuration (api_key)

Logging the full LlamaStack configuration exposes secrets (api_key/SecretStr). Remove or redact these logger statements.

Fix at least: src/app/endpoints/streaming_query.py:598, src/app/endpoints/query.py:181, src/app/endpoints/models.py:81, src/lightspeed_stack.py:62–65, src/configuration.py:52.

- logger.info("Llama stack config: %s", configuration.llama_stack_configuration) + # Avoid leaking secrets (api_key). + cfg = configuration.llama_stack_configuration + logger.info( + "Llama stack config: url=%s, use_as_library_client=%s", + getattr(cfg, "url", "<redacted>"), + getattr(cfg, "use_as_library_client", "<redacted>"), + )

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/app/endpoints/streaming_query.py (lines ~56–94) and specifically remove/redact secrets at the logger calls indicated in the review: src/app/endpoints/streaming_query.py:598, src/app/endpoints/query.py:181, src/app/endpoints/models.py:81, src/lightspeed_stack.py:62–65, src/configuration.py:52 — the issue is that Llama Stack configuration (api_key / SecretStr) is being logged; change those logger statements to never print raw secret values by either removing the log, logging only the presence/absence of a key (e.g., "api_key: set" or "api_key: not set"), or log a sanitized representation (mask the key like "***REDACTED***" or use config.dict(exclude={"api_key"})/exclude_unset to omit secrets); ensure any SecretStr uses get_secret_value() only when necessary and never passed to logger, and update all affected lines to use the sanitized/omitted form.

radofuchs requested a review from tisnik September 18, 2025 10:56

tisnik requested changes Sep 18, 2025

View reviewed changes

src/auth/README.md Outdated Show resolved Hide resolved

src/services/README.md Outdated Show resolved Hide resolved

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

radofuchs force-pushed the LCORE_432_update_doc_endpoints branch from bdfe3cd to 911945d Compare September 18, 2025 11:53

update doc for endpoints

911945d

tisnik approved these changes Sep 18, 2025

View reviewed changes

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

tisnik merged commit d0d0393 into lightspeed-core:main Sep 18, 2025
18 of 19 checks passed

coderabbitai bot mentioned this pull request Sep 30, 2025

LCORE-282: ensure compatibility with OLS streaming query #519

Merged

coderabbitai bot mentioned this pull request Oct 7, 2025

LCORE-178: add token usage tracking to LLM responses #635

Merged

18 tasks

This was referenced Oct 16, 2025

LCORE-793: Refactored status codes and response models for v1/conversations #683

Merged

[RHDHPAI-1143] Implement referenced_documents caching #643

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-432: update doc for endpoints #560

LCORE-432: update doc for endpoints #560

Uh oh!

radofuchs commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 18, 2025

Uh oh!

tisnik left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LCORE-432: update doc for endpoints #560

LCORE-432: update doc for endpoints #560

Uh oh!

Conversation

radofuchs commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radofuchs commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading