-
Notifications
You must be signed in to change notification settings - Fork 45
LCORE-432: update doc for endpoints #560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCORE-432: update doc for endpoints #560
Conversation
WalkthroughThis PR updates OpenAPI docs and server endpoint response schemas, adds SSE documentation and handlers for the streaming query, aligns connectivity errors to 500, and introduces credential/authorization error responses across conversation endpoints. Tests are updated to reflect new response codes. InfoResponse gains llama_stack_version; docs are synced. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant API as /v1/streaming_query
participant LS as Llama Stack
participant SSE as SSE Stream
Note over API,SSE: New SSE flow (text/event-stream)
C->>API: POST /v1/streaming_query (request)
API->>SSE: open stream
API->>SSE: event: start
API->>LS: initiate streaming inference
loop chunks
LS-->>API: token/tool/heartbeat chunk
API->>SSE: event: token / tool / heartbeat
end
LS-->>API: done
API->>SSE: event: end (usage, transcript, metadata)
SSE-->>C: streamed events
alt Auth or permission error
API-->>C: 400/401/403 (JSON)
else Connection/internal error
API-->>C: 500 (detail)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in overall, you just have two files there that should not be there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/openapi.json (1)
177-183
: Invalid OpenAPI: “detail” at the top level of a response is not allowed.Move the payload under content with a schema (e.g., ErrorResponse). As written, this won’t validate.
- "500": { - "description": "Internal Server Error", - "detail": { - "response": "Unable to connect to Llama Stack", - "cause": "Connection error." - } - }, + "500": { + "description": "Internal Server Error", + "content": { + "application/json": { + "schema": { "$ref": "#/components/schemas/ErrorResponse" } + } + } + },
🧹 Nitpick comments (12)
src/auth/README.md (1)
1-2
: Expand README to convey module purpose and usage.A single header doesn’t help future contributors. Consider adding: overview, responsibilities, security model, env vars, key files, and how to add a new auth provider.
Apply this minimal scaffold:
-# List of source files stored in `src/auth` directory +# Auth module + +Purpose: authentication/authorization utilities and integrations. + +Includes: +- Providers: OAuth/OIDC, API key, service tokens +- Middlewares/guards +- Token/signing utilities + +Conventions: +- All providers implement a common interface +- No network calls in pure helpers + +Env vars: +- AUTH_ISSUER, AUTH_AUDIENCE, AUTH_PUBLIC_KEYS_URL (example) + +Contributing: +- Adding a provider: steps, tests, docs links + +Ownership: <team/Slack channel>src/services/README.md (1)
1-2
: Document service layer structure and conventions.Add brief guidance so newcomers know how to navigate and extend services.
Suggested skeleton:
-# List of source files stored in `src/services` directory +# Services layer + +Purpose: business logic orchestrating repositories, clients, and auth. + +Layout: +- <service_name>Service.* — stateless, dependency-injected +- Errors: use domain-specific error types + +Conventions: +- Timeouts/retries handled in clients; services remain pure +- No I/O in pure helpers; side effects isolated + +Testing: +- Unit tests with mocked clients +- Integration tests for critical flows + +Adding a service: +- Interface, implementation, wiring, tests, docs + +Ownership: <team/Slack channel>docs/output.md (1)
77-77
: Table formatting needs blank lines.Add blank lines before and after the table for proper Markdown formatting:
| Status Code | Description | Component | |-------------|-------------|-----------| | 200 | Successful Response | [ModelsResponse](#modelsresponse) | | 500 | Connection to Llama Stack is broken | | + ## POST `/v1/query`
docs/openapi.md (1)
77-77
: Table formatting needs blank lines.Add blank lines around the table for proper Markdown formatting:
| Status Code | Description | Component | |-------------|-------------|-----------| | 200 | Successful Response | [ModelsResponse](#modelsresponse) | | 500 | Connection to Llama Stack is broken | | + ## POST `/v1/query`
src/app/endpoints/streaming_query.py (1)
640-683
: Verify error handling in the streaming generator.The response_generator function handles the SSE stream well, but consider what happens if an exception occurs during streaming after the response has started. The client would receive a partial stream without a proper error event.
Consider adding error handling within the generator to emit an error event:
async def response_generator( turn_response: AsyncIterator[AgentTurnResponseStreamChunk], ) -> AsyncIterator[str]: chunk_id = 0 summary = TurnSummary( llm_response="No response from the model", tool_calls=[] ) # Send start event yield stream_start_event(conversation_id) try: async for chunk in turn_response: # ... existing chunk processing ... yield stream_end_event(metadata_map) except Exception as e: # Emit error event to the stream logger.exception("Error during streaming: %s", str(e)) yield format_stream_data({ "event": "error", "data": { "id": chunk_id, "message": "Stream interrupted due to an error" } }) raise # Re-raise to ensure proper cleanup # ... rest of the functiondocs/openapi.json (7)
53-53
: Tighten wording and casing in description; drop docstring-style “Returns”.Use consistent “Llama Stack” casing and avoid Python‑docstring phrasing inside OpenAPI descriptions.
- "description": "Handle request to the /info endpoint.\n\nProcess GET requests to the /info endpoint, returning the\nservice name, version and Llama-stack version.\n\nReturns:\n InfoResponse: An object containing the service's name and version.", + "description": "Handles GET /v1/info and returns service name, service version, and Llama Stack version.",
120-122
: 500 response lacks JSON content schema; add ErrorResponse and align semantics.Right now it only has a description. Either keep 503 for upstream dependency failure or, if standardizing on 500, at least include the schema.
- "500": { - "description": "Connection to Llama Stack is broken" - } + "500": { + "description": "Internal Server Error: Unable to connect to Llama Stack", + "content": { + "application/json": { + "schema": { "$ref": "#/components/schemas/ErrorResponse" } + } + } + }Please confirm integration tests were updated to expect 500 with ErrorResponse for /v1/models.
217-229
: SSE 200 response: consider removing application/json stub or document handshake.If 200 always streams SSE, keep only text/event-stream. If a JSON prelude is emitted, document its structure; empty {} schema is misleading.
- "content": { - "application/json": { - "schema": {} - }, + "content": { "text/event-stream": { "schema": { "type": "string", "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"role\": \"inference\", \"token\": \"Hello\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 0, \"output_tokens\": 0}, \"available_quotas\": {}}\n\n" } } }If you keep JSON, add a concrete schema (e.g., a minimal handshake object) and document when it’s sent.
230-249
: 400 vs 401 semantics: clarify boundary and reuse consistent descriptions.You map both 400 and 401 to UnauthorizedResponse. If 400 is only for noop/noop-with-token modules (as elsewhere), reflect that in the description and ensure clients can distinguish. Otherwise drop 400 here.
Suggested tweak (descriptions only):
- "description": "Missing or invalid credentials provided by client" + "description": "Bad Request: missing/invalid credentials for noop/noop-with-token auth"
519-538
: Conversations list: verify 400/401 usage matches auth backends and keep payload schema consistent.Both map to UnauthorizedResponse, which is fine; ensure error messages are distinct enough for clients, and confirm 503 vs 500 policy across endpoints.
596-615
: Conversation GET: same 400/401 semantics consistency check.Align descriptions with actual behavior and keep UnauthorizedResponse for both or specialize schemas if needed.
674-693
: Conversation DELETE: same 400/401 consistency and client‑distinguishable messages.Also ensure upstream connectivity errors use a consistent status (500 vs 503) across all conversation endpoints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/openapi.json
(7 hunks)docs/openapi.md
(7 hunks)docs/output.md
(7 hunks)src/app/endpoints/conversations.py
(4 hunks)src/app/endpoints/info.py
(1 hunks)src/app/endpoints/models.py
(1 hunks)src/app/endpoints/query.py
(1 hunks)src/app/endpoints/streaming_query.py
(3 hunks)src/auth/README.md
(1 hunks)src/services/README.md
(1 hunks)tests/integration/test_openapi_json.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/app/endpoints/conversations.py (1)
src/models/responses.py (1)
UnauthorizedResponse
(361-375)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (2)
UnauthorizedResponse
(361-375)ForbiddenResponse
(378-390)
🪛 markdownlint-cli2 (0.17.2)
docs/output.md
77-77: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
docs/openapi.md
77-77: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: e2e_tests
- GitHub Check: build-pr
🔇 Additional comments (20)
src/app/endpoints/models.py (1)
49-49
: LGTM! Status code alignment for connectivity errors.The change from 503 to 500 aligns with the PR-wide update to use HTTP 500 for backend connectivity issues. The error payload remains unchanged, maintaining backward compatibility for clients that parse the response body.
src/app/endpoints/query.py (1)
62-62
: LGTM! Consistent error code update.The status code change from 503 to 500 for Llama Stack connectivity errors is consistent with the models endpoint and aligns with the PR-wide standardization.
src/app/endpoints/info.py (1)
50-50
: LGTM! Documentation improvement.The docstring update accurately reflects that the endpoint returns the Llama Stack version in addition to the service name and version. This aligns with the InfoResponse model which already includes the
llama_stack_version
field.docs/output.md (4)
37-37
: LGTM! Documentation accurately reflects endpoint changes.The documentation correctly describes the info endpoint's response, including the Llama Stack version.
111-111
: LGTM! Status code standardization documented.The documentation correctly reflects the change from 503 to 500 for connectivity errors.
146-150
: LGTM! Comprehensive SSE documentation for streaming endpoint.The updated documentation clearly describes:
- The 200 response as SSE with text/event-stream content type
- New error responses (400, 401, 403, 500) with appropriate models
- Maintains existing 422 validation error
276-277
: LGTM! Consistent error response documentation for conversations endpoints.All three conversation endpoints now consistently document:
- 400: Missing or invalid credentials
- 401: Invalid or missing Bearer token
This aligns with the expanded error surface across the API.
Also applies to: 313-314, 346-347
src/app/endpoints/conversations.py (2)
22-22
: LGTM! UnauthorizedResponse import added for error documentation.The import is appropriately used to document potential authentication errors in the OpenAPI responses.
49-56
: Auth middleware raises 400/401 — OpenAPI entries are accurate.
src/authentication/jwk_token.py raises HTTPException(status.HTTP_400_BAD_REQUEST) for token DecodeError/JoseError (≈lines 145–151) and HTTPException(status.HTTP_401_UNAUTHORIZED) for KeyNotFound/BadSignature/ExpiredToken/missing-claim cases (≈lines 134–136, 138–141, 162–169, 177–189).tests/integration/test_openapi_json.py (2)
62-62
: LGTM! Test expectations updated correctly.The test expectations have been properly updated to reflect:
- Status code changes from 503 to 500 for connectivity errors
- Addition of 400/401 credential-related errors where appropriate
Also applies to: 63-63
69-79
: LGTM! Comprehensive test coverage for conversations endpoints.All three conversation endpoints now expect the expanded error surface (400, 401) in addition to their existing error codes. The formatting change improves readability.
docs/openapi.md (2)
37-37
: LGTM! Documentation updates align with code changes.The OpenAPI documentation correctly reflects:
- Info endpoint now mentions Llama Stack version
- Models and Query endpoints use 500 for connectivity errors (previously 503)
Also applies to: 77-77, 111-111
146-150
: LGTM! Comprehensive error response documentation.The expanded error surface is well-documented:
- Streaming endpoint: SSE response (200) and comprehensive errors (400/401/403/500)
- Conversations endpoints: Consistent 400/401 credential errors
Also applies to: 276-277, 313-314, 346-347
src/app/endpoints/streaming_query.py (6)
32-32
: LGTM! Appropriate imports for error response models.The imports of UnauthorizedResponse and ForbiddenResponse are used correctly in the streaming_query_responses dictionary.
56-94
: Excellent SSE response documentation!The streaming_query_responses dictionary provides comprehensive OpenAPI documentation:
- 200 response properly specifies text/event-stream content type with a clear example
- Error responses (400, 401, 403, 500) are well-defined with appropriate models
- The SSE event format example helps API consumers understand the expected stream structure
100-112
: Well-structured SSE formatting utility.The format_stream_data function properly formats SSE events with the required "data: " prefix and double newline suffix.
114-136
: Clean SSE event generation functions.The stream_start_event and stream_end_event functions provide a clear structure for SSE lifecycle events. The end event includes comprehensive metadata about referenced documents and token usage placeholders.
Also applies to: 138-174
177-220
: Well-organized event dispatcher with comprehensive handling.The stream_build_event function uses pattern matching effectively to route different event types to their respective handlers. The catch-all heartbeat ensures the stream stays alive.
563-563
: LGTM! Endpoint properly decorated with responses.The streaming_query endpoint is correctly decorated with the streaming_query_responses dictionary, ensuring proper OpenAPI documentation.
docs/openapi.json (1)
250-258
: 403 mapping looks good.
"500": { | ||
"description": "Internal Server Error", | ||
"detail": { | ||
"response": "Unable to connect to Llama Stack", | ||
"cause": "Connection error." | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid OpenAPI: “detail” at top level in 500 response; move under content.
Same issue as /v1/query. Use ErrorResponse.
- "500": {
- "description": "Internal Server Error",
- "detail": {
- "response": "Unable to connect to Llama Stack",
- "cause": "Connection error."
- }
- },
+ "500": {
+ "description": "Internal Server Error",
+ "content": {
+ "application/json": {
+ "schema": { "$ref": "#/components/schemas/ErrorResponse" }
+ }
+ }
+ },
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"500": { | |
"description": "Internal Server Error", | |
"detail": { | |
"response": "Unable to connect to Llama Stack", | |
"cause": "Connection error." | |
} | |
}, | |
"500": { | |
"description": "Internal Server Error", | |
"content": { | |
"application/json": { | |
"schema": { "$ref": "#/components/schemas/ErrorResponse" } | |
} | |
} | |
}, |
🤖 Prompt for AI Agents
In docs/openapi.json around lines 260 to 266, the 500 response currently
contains a top-level "detail" object which is invalid OpenAPI; move that error
payload under the response's "content" (e.g., application/json) and reference or
use the existing ErrorResponse schema instead of an inline "detail" object so
the response body is properly described (replace the inline detail with a
content block that references ErrorResponse).
bdfe3cd
to
911945d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/app/endpoints/streaming_query.py (1)
563-570
: Set SSE content-type and disable buffering on StreamingResponse.File: src/app/endpoints/streaming_query.py:713 — return currently lacks SSE media type and no-buffering headers.
- return StreamingResponse(response_generator(response)) + return StreamingResponse( + response_generator(response), + media_type="text/event-stream", + headers={ + "Cache-Control": "no-cache", + "Connection": "keep-alive", + "X-Accel-Buffering": "no", + }, + )
🧹 Nitpick comments (12)
docs/openapi.md (6)
37-37
: Capitalize and punctuate consistently.Prefer “service name, version, and Llama Stack version.”
- service name, version and Llama-stack version. + service name, version, and Llama Stack version.
77-77
: Tighten 500 description; add surrounding blank lines for MD058.Phrase as “Connection to Llama Stack failed” and ensure a blank line before/after the table to satisfy markdownlint MD058.
-| 500 | Connection to Llama Stack is broken | | +| 500 | Connection to Llama Stack failed | |
146-150
: SSE docs polish and consistent error component.
- Replace placeholder “…string” with “string”; call out content type.
- Add ErrorResponse component for 500 like other endpoints.
-| 200 | Streaming response with Server-Sent Events | ...string | +| 200 | Streaming response with Server-Sent Events (text/event-stream) | string | ... -| 500 | Internal Server Error | | +| 500 | Internal Server Error | [ErrorResponse](#errorresponse) |
276-277
: Clarify 400 vs 401 semantics.If 400 is specific to noop/noop-with-token auth modules, say so; otherwise consider 401 for auth failures.
-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) | +| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
313-314
: Mirror the 400 wording here for consistency.-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) | +| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
346-347
: Mirror the 400 wording here as well.-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) | +| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |docs/output.md (6)
37-37
: Capitalize and punctuate consistently.Use “service name, version, and Llama Stack version.”
- service name, version and Llama-stack version. + service name, version, and Llama Stack version.
77-77
: Refine 500 wording; fix MD058 blanks-around-tables.Use “Connection to Llama Stack failed” and add a blank line above/below the table.
-| 500 | Connection to Llama Stack is broken | | +| 500 | Connection to Llama Stack failed | |
146-150
: SSE section: fix component and 500 schema reference.
- Replace “…string” with “string”; call out text/event-stream.
- Add ErrorResponse to 500.
-| 200 | Streaming response with Server-Sent Events | ...string | +| 200 | Streaming response with Server-Sent Events (text/event-stream) | string | ... -| 500 | Internal Server Error | | +| 500 | Internal Server Error | [ErrorResponse](#errorresponse) |
276-277
: Clarify 400 vs 401 semantics (noop/noop-with-token).-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) | +| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
313-314
: Same clarification here.-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) | +| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
346-347
: Same clarification for DELETE endpoint.-| 400 | Missing or invalid credentials provided by client | [UnauthorizedResponse](#unauthorizedresponse) | +| 400 | Missing or invalid credentials (noop/noop-with-token auth) | [UnauthorizedResponse](#unauthorizedresponse) |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
docs/openapi.json
(7 hunks)docs/openapi.md
(7 hunks)docs/output.md
(7 hunks)src/app/endpoints/conversations.py
(4 hunks)src/app/endpoints/info.py
(1 hunks)src/app/endpoints/models.py
(1 hunks)src/app/endpoints/query.py
(1 hunks)src/app/endpoints/streaming_query.py
(3 hunks)tests/integration/test_openapi_json.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/openapi.json
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/integration/test_openapi_json.py
- src/app/endpoints/query.py
- src/app/endpoints/models.py
- src/app/endpoints/conversations.py
- src/app/endpoints/info.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/streaming_query.py (1)
src/models/responses.py (2)
UnauthorizedResponse
(361-375)ForbiddenResponse
(378-390)
🪛 markdownlint-cli2 (0.17.2)
docs/openapi.md
77-77: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (3)
src/app/endpoints/streaming_query.py (1)
32-32
: Imports for auth error models look correct.docs/openapi.md (1)
111-111
: 503→500 change acknowledged — openapi.json and tests verified.Verified: /v1/query in docs/openapi.json contains a 500 response and tests/integration/test_openapi_json.py expects 500; health tests still assert 503 in tests/unit/app/endpoints/test_health.py.
docs/output.md (1)
111-111
: 503→500 acknowledged — OpenAPI JSON & tests in sync.
Verified docs/output.md, docs/openapi.json and tests/integration/test_openapi_json.py show 500 for /v1/query; other endpoints that should be 503 remain unchanged.
streaming_query_responses: dict[int | str, dict[str, Any]] = { | ||
200: { | ||
"description": "Streaming response with Server-Sent Events", | ||
"content": { | ||
"text/event-stream": { | ||
"schema": { | ||
"type": "string", | ||
"example": ( | ||
'data: {"event": "start", ' | ||
'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n' | ||
'data: {"event": "token", "data": {"id": 0, "role": "inference", ' | ||
'"token": "Hello"}}\n\n' | ||
'data: {"event": "end", "data": {"referenced_documents": [], ' | ||
'"truncated": null, "input_tokens": 0, "output_tokens": 0}, ' | ||
'"available_quotas": {}}\n\n' | ||
), | ||
} | ||
} | ||
}, | ||
}, | ||
400: { | ||
"description": "Missing or invalid credentials provided by client", | ||
"model": UnauthorizedResponse, | ||
}, | ||
401: { | ||
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth", | ||
"model": UnauthorizedResponse, | ||
}, | ||
403: { | ||
"description": "User is not authorized", | ||
"model": ForbiddenResponse, | ||
}, | ||
500: { | ||
"detail": { | ||
"response": "Unable to connect to Llama Stack", | ||
"cause": "Connection error.", | ||
} | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix FastAPI responses spec: 500 block uses unsupported “detail” key.
FastAPI expects description/model/content. Use description + JSON example.
- "detail": {
- "response": "Unable to connect to Llama Stack",
- "cause": "Connection error.",
- }
+ "description": "Internal Server Error",
+ "content": {
+ "application/json": {
+ "example": {
+ "response": "Unable to connect to Llama Stack",
+ "cause": "Connection error."
+ }
+ }
+ },
},
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
streaming_query_responses: dict[int | str, dict[str, Any]] = { | |
200: { | |
"description": "Streaming response with Server-Sent Events", | |
"content": { | |
"text/event-stream": { | |
"schema": { | |
"type": "string", | |
"example": ( | |
'data: {"event": "start", ' | |
'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n' | |
'data: {"event": "token", "data": {"id": 0, "role": "inference", ' | |
'"token": "Hello"}}\n\n' | |
'data: {"event": "end", "data": {"referenced_documents": [], ' | |
'"truncated": null, "input_tokens": 0, "output_tokens": 0}, ' | |
'"available_quotas": {}}\n\n' | |
), | |
} | |
} | |
}, | |
}, | |
400: { | |
"description": "Missing or invalid credentials provided by client", | |
"model": UnauthorizedResponse, | |
}, | |
401: { | |
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth", | |
"model": UnauthorizedResponse, | |
}, | |
403: { | |
"description": "User is not authorized", | |
"model": ForbiddenResponse, | |
}, | |
500: { | |
"detail": { | |
"response": "Unable to connect to Llama Stack", | |
"cause": "Connection error.", | |
} | |
}, | |
} | |
streaming_query_responses: dict[int | str, dict[str, Any]] = { | |
200: { | |
"description": "Streaming response with Server-Sent Events", | |
"content": { | |
"text/event-stream": { | |
"schema": { | |
"type": "string", | |
"example": ( | |
'data: {"event": "start", ' | |
'"data": {"conversation_id": "123e4567-e89b-12d3-a456-426614174000"}}\n\n' | |
'data: {"event": "token", "data": {"id": 0, "role": "inference", ' | |
'"token": "Hello"}}\n\n' | |
'data: {"event": "end", "data": {"referenced_documents": [], ' | |
'"truncated": null, "input_tokens": 0, "output_tokens": 0}, ' | |
'"available_quotas": {}}\n\n' | |
), | |
} | |
} | |
}, | |
}, | |
400: { | |
"description": "Missing or invalid credentials provided by client", | |
"model": UnauthorizedResponse, | |
}, | |
401: { | |
"description": "Unauthorized: Invalid or missing Bearer token for k8s auth", | |
"model": UnauthorizedResponse, | |
}, | |
403: { | |
"description": "User is not authorized", | |
"model": ForbiddenResponse, | |
}, | |
500: { | |
"description": "Internal Server Error", | |
"content": { | |
"application/json": { | |
"example": { | |
"response": "Unable to connect to Llama Stack", | |
"cause": "Connection error." | |
} | |
} | |
}, | |
}, | |
} |
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py around lines 56 to 94, the 500 response
entry uses an unsupported "detail" key; replace it with a valid FastAPI response
spec (use "description" and either "model" or "content") and provide a JSON
example under "content" (e.g., "application/json": {"schema":
{"type":"object","example": {"response":"Unable to connect to Llama
Stack","cause":"Connection error."}}}) so the OpenAPI schema is valid and
clients see the example.
🧩 Analysis chain
Avoid logging secrets from configuration.
Line 598 logs the full Llama Stack configuration (likely includes api_key). Remove or redact.
- logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+ # Avoid leaking secrets (api_key).
+ cfg = configuration.llama_stack_configuration
+ logger.info(
+ "Llama stack config: url=%s, use_as_library_client=%s",
+ getattr(cfg, "url", "<redacted>"),
+ getattr(cfg, "use_as_library_client", "<redacted>"),
+ )
🏁 Script executed:
#!/bin/bash
rg -nP 'logger\.(info|debug).*(llama stack config|api_key)' -C2 src
Length of output: 0
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "1) Searching for direct references to Llama config..."
rg -n --hidden -S "llama_stack_configuration|Llama stack|llama stack|LlamaStack" -C3 || true
echo "2) Searching for configuration.llama* usages..."
rg -n --hidden -S "configuration\.llama" -C3 || true
echo "3) Searching for logger calls (inspect surrounding context)..."
rg -n --hidden -S "logger\.(info|debug|warn|warning|error)" -C3 || true
echo "4) Searching for potential secret keys (api_key variations)..."
rg -n --hidden -S "api_key|API_KEY|apikey|apiKey" -C3 || true
Length of output: 214306
Avoid logging secrets — redact Llama Stack configuration (api_key)
Logging the full LlamaStack configuration exposes secrets (api_key/SecretStr). Remove or redact these logger statements.
- Fix at least: src/app/endpoints/streaming_query.py:598, src/app/endpoints/query.py:181, src/app/endpoints/models.py:81, src/lightspeed_stack.py:62–65, src/configuration.py:52.
- logger.info("Llama stack config: %s", configuration.llama_stack_configuration)
+ # Avoid leaking secrets (api_key).
+ cfg = configuration.llama_stack_configuration
+ logger.info(
+ "Llama stack config: url=%s, use_as_library_client=%s",
+ getattr(cfg, "url", "<redacted>"),
+ getattr(cfg, "use_as_library_client", "<redacted>"),
+ )
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/app/endpoints/streaming_query.py (lines ~56–94) and specifically
remove/redact secrets at the logger calls indicated in the review:
src/app/endpoints/streaming_query.py:598, src/app/endpoints/query.py:181,
src/app/endpoints/models.py:81, src/lightspeed_stack.py:62–65,
src/configuration.py:52 — the issue is that Llama Stack configuration (api_key /
SecretStr) is being logged; change those logger statements to never print raw
secret values by either removing the log, logging only the presence/absence of a
key (e.g., "api_key: set" or "api_key: not set"), or log a sanitized
representation (mask the key like "***REDACTED***" or use
config.dict(exclude={"api_key"})/exclude_unset to omit secrets); ensure any
SecretStr uses get_secret_value() only when necessary and never passed to
logger, and update all affected lines to use the sanitized/omitted form.
Description
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests