-
Notifications
You must be signed in to change notification settings - Fork 45
[RHDHPAI-1143] Implement referenced_documents caching #643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Maysun J Faisal <[email protected]>
WalkthroughAdds a GET /v1/tools endpoint and related OpenAPI schemas (ToolsResponse, ByokRag, Action.get_tools). Refactors caching to use a unified CacheEntry with optional AdditionalKwargs (referenced_documents), updating endpoints, utils, and cache backends (SQLite/Postgres schemas). Moves ConversationData to models.responses. Updates message transformation to include optional additional_kwargs. Tests adjusted accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant API as API Endpoint
participant LLM as LLM Provider
participant Cache as Cache Backend
rect rgb(245,248,255)
note over API: Query/Streaming flow (new CacheEntry path)
User->>API: Send query
API->>LLM: Request completion/stream
LLM-->>API: Response (+metadata)
API->>API: Build ReferencedDocument list (if any)
API->>API: Create AdditionalKwargs (optional)
API->>API: Create CacheEntry {query,response,provider,model,timestamps,additional_kwargs}
API->>Cache: insert_or_append(CacheEntry)
Cache-->>API: Ack
API-->>User: Return response (+end-of-stream if streaming)
end
sequenceDiagram
autonumber
actor Client
participant API as API
participant MCP as MCP Servers
rect rgb(245,255,245)
note over API: /v1/tools (new)
Client->>API: GET /v1/tools
API->>MCP: Fetch tools from configured servers
MCP-->>API: Tools lists
API-->>Client: 200 ToolsResponse { tools: [...] }
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Maysun J Faisal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/cache/postgres_cache.py (1)
41-81
: Handle schema migration foradditional_kwargs
column.In existing deployments the
cache
table already exists without this column, so the new SELECT/INSERT paths will immediately raisecolumn "additional_kwargs" does not exist
. Please extendinitialize_cache()
toALTER TABLE
(or otherwise migrate) before we read/write the column.logger.info("Initializing table for cache") cursor.execute(PostgresCache.CREATE_CACHE_TABLE) + logger.info("Ensuring additional_kwargs column exists") + cursor.execute( + "ALTER TABLE cache ADD COLUMN IF NOT EXISTS additional_kwargs jsonb" + ) +src/cache/sqlite_cache.py (1)
45-86
: Add migration for SQLiteadditional_kwargs
column.Production environments already have a
cache
table without this column; the new SELECT will crash withOperationalError: no such column: additional_kwargs
. Please teachinitialize_cache()
to add the column when missing (e.g., checkPRAGMA table_info('cache')
and issueALTER TABLE cache ADD COLUMN additional_kwargs TEXT
).logger.info("Initializing table for cache") cursor.execute(SQLiteCache.CREATE_CACHE_TABLE) + logger.info("Ensuring additional_kwargs column exists") + existing_cols = { + row[1] for row in cursor.execute("PRAGMA table_info('cache')") + } + if "additional_kwargs" not in existing_cols: + cursor.execute("ALTER TABLE cache ADD COLUMN additional_kwargs TEXT") +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
docs/openapi.json
(6 hunks)src/app/endpoints/conversations_v2.py
(1 hunks)src/app/endpoints/query.py
(3 hunks)src/app/endpoints/streaming_query.py
(3 hunks)src/cache/cache.py
(1 hunks)src/cache/in_memory_cache.py
(1 hunks)src/cache/noop_cache.py
(1 hunks)src/cache/postgres_cache.py
(6 hunks)src/cache/sqlite_cache.py
(6 hunks)src/models/cache_entry.py
(3 hunks)src/models/responses.py
(2 hunks)src/utils/endpoints.py
(1 hunks)tests/unit/app/endpoints/test_conversations_v2.py
(2 hunks)tests/unit/app/endpoints/test_query.py
(4 hunks)tests/unit/app/endpoints/test_streaming_query.py
(3 hunks)tests/unit/cache/test_postgres_cache.py
(2 hunks)tests/unit/cache/test_sqlite_cache.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use absolute imports for internal modules (e.g., from auth import get_auth_dependency)
Files:
src/app/endpoints/conversations_v2.py
src/utils/endpoints.py
src/cache/noop_cache.py
src/app/endpoints/query.py
src/cache/in_memory_cache.py
src/cache/cache.py
src/cache/sqlite_cache.py
src/models/cache_entry.py
src/app/endpoints/streaming_query.py
src/cache/postgres_cache.py
src/models/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use standard FastAPI imports (from fastapi import APIRouter, HTTPException, Request, status, Depends) in FastAPI app code
Files:
src/app/endpoints/conversations_v2.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py
: All modules start with descriptive module-level docstrings explaining purpose
Use logger = logging.getLogger(name) for module logging after import logging
Define type aliases at module level for clarity
All functions require docstrings with brief descriptions
Provide complete type annotations for all function parameters and return types
Use typing_extensions.Self in model validators where appropriate
Use modern union syntax (str | int) and Optional[T] or T | None consistently
Function names use snake_case with descriptive, action-oriented prefixes (get_, validate_, check_)
Avoid in-place parameter modification; return new data structures instead of mutating arguments
Use appropriate logging levels: debug, info, warning, error with clear messages
All classes require descriptive docstrings explaining purpose
Class names use PascalCase with conventional suffixes (Configuration, Error/Exception, Resolver, Interface)
Abstract base classes should use abc.ABC and @AbstractMethod for interfaces
Provide complete type annotations for all class attributes
Follow Google Python docstring style for modules, classes, and functions, including Args, Returns, Raises, Attributes sections as needed
Files:
src/app/endpoints/conversations_v2.py
src/utils/endpoints.py
src/cache/noop_cache.py
tests/unit/cache/test_postgres_cache.py
src/app/endpoints/query.py
src/cache/in_memory_cache.py
src/cache/cache.py
src/cache/sqlite_cache.py
tests/unit/app/endpoints/test_streaming_query.py
src/models/cache_entry.py
tests/unit/cache/test_sqlite_cache.py
tests/unit/app/endpoints/test_conversations_v2.py
tests/unit/app/endpoints/test_query.py
src/app/endpoints/streaming_query.py
src/cache/postgres_cache.py
src/models/responses.py
src/{app/**/*.py,client.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async def for I/O-bound operations and external API calls
Files:
src/app/endpoints/conversations_v2.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
In API endpoints, raise FastAPI HTTPException with appropriate status codes for error handling
Files:
src/app/endpoints/conversations_v2.py
src/app/endpoints/query.py
src/app/endpoints/streaming_query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py
: Use pytest for all unit and integration tests
Do not use unittest in tests; pytest is the standard
Files:
tests/unit/cache/test_postgres_cache.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/cache/test_sqlite_cache.py
tests/unit/app/endpoints/test_conversations_v2.py
tests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py
: Use pytest-mock to create AsyncMock objects for async interactions in tests
Use the shared auth mock constant: MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") in tests
Files:
tests/unit/cache/test_postgres_cache.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/cache/test_sqlite_cache.py
tests/unit/app/endpoints/test_conversations_v2.py
tests/unit/app/endpoints/test_query.py
src/{models/**/*.py,configuration.py}
📄 CodeRabbit inference engine (CLAUDE.md)
src/{models/**/*.py,configuration.py}
: Use @field_validator and @model_validator for custom validation in Pydantic models
Use precise type hints in configuration (e.g., Optional[FilePath], PositiveInt, SecretStr)
Files:
src/models/cache_entry.py
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py
: Pydantic models: use BaseModel for data models and extend ConfigurationBase for configuration
Use @model_validator and @field_validator for Pydantic model validation
Files:
src/models/cache_entry.py
src/models/responses.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T16:46:33.353Z
Learnt from: CR
PR: lightspeed-core/lightspeed-stack#0
File: CLAUDE.md:0-0
Timestamp: 2025-09-18T16:46:33.353Z
Learning: Applies to src/client.py : Use Llama Stack client import: from llama_stack_client import AsyncLlamaStackClient
Applied to files:
src/app/endpoints/query.py
🧬 Code graph analysis (14)
src/utils/endpoints.py (1)
src/models/cache_entry.py (1)
CacheEntry
(12-29)
src/cache/noop_cache.py (2)
src/models/cache_entry.py (1)
CacheEntry
(12-29)src/models/responses.py (1)
ConversationData
(101-112)
tests/unit/cache/test_postgres_cache.py (3)
src/cache/postgres_cache.py (3)
PostgresCache
(16-395)insert_or_append
(235-285)get
(191-232)src/models/cache_entry.py (2)
AdditionalKwargs
(7-9)CacheEntry
(12-29)src/models/responses.py (2)
ReferencedDocument
(114-128)ConversationData
(101-112)
src/app/endpoints/query.py (2)
src/models/cache_entry.py (2)
CacheEntry
(12-29)AdditionalKwargs
(7-9)src/utils/endpoints.py (1)
store_conversation_into_cache
(188-208)
src/cache/in_memory_cache.py (2)
src/models/cache_entry.py (1)
CacheEntry
(12-29)src/models/responses.py (1)
ConversationData
(101-112)
src/cache/cache.py (2)
src/models/cache_entry.py (1)
CacheEntry
(12-29)src/models/responses.py (1)
ConversationData
(101-112)
src/cache/sqlite_cache.py (4)
src/cache/cache.py (1)
Cache
(10-129)src/models/cache_entry.py (2)
CacheEntry
(12-29)AdditionalKwargs
(7-9)src/models/config.py (2)
config
(139-145)SQLiteDatabaseConfiguration
(74-77)src/models/responses.py (1)
ConversationData
(101-112)
tests/unit/app/endpoints/test_streaming_query.py (1)
src/models/cache_entry.py (1)
CacheEntry
(12-29)
src/models/cache_entry.py (1)
src/models/responses.py (1)
ReferencedDocument
(114-128)
tests/unit/cache/test_sqlite_cache.py (4)
src/models/cache_entry.py (2)
AdditionalKwargs
(7-9)CacheEntry
(12-29)src/models/responses.py (2)
ReferencedDocument
(114-128)ConversationData
(101-112)tests/unit/cache/test_postgres_cache.py (2)
test_insert_and_get_with_additional_kwargs
(385-430)test_insert_and_get_without_additional_kwargs
(433-470)src/cache/sqlite_cache.py (2)
insert_or_append
(233-283)get
(189-230)
tests/unit/app/endpoints/test_conversations_v2.py (3)
src/app/endpoints/conversations_v2.py (1)
transform_chat_message
(241-262)src/models/cache_entry.py (2)
AdditionalKwargs
(7-9)CacheEntry
(12-29)src/models/responses.py (1)
ReferencedDocument
(114-128)
tests/unit/app/endpoints/test_query.py (2)
src/models/cache_entry.py (1)
CacheEntry
(12-29)src/models/responses.py (1)
ReferencedDocument
(114-128)
src/app/endpoints/streaming_query.py (3)
src/models/cache_entry.py (2)
CacheEntry
(12-29)AdditionalKwargs
(7-9)src/models/responses.py (1)
ReferencedDocument
(114-128)src/utils/endpoints.py (2)
create_referenced_documents_with_metadata
(520-534)store_conversation_into_cache
(188-208)
src/cache/postgres_cache.py (2)
src/models/cache_entry.py (2)
CacheEntry
(12-29)AdditionalKwargs
(7-9)src/models/responses.py (1)
ConversationData
(101-112)
🔇 Additional comments (11)
src/cache/cache.py (1)
5-6
: LGTM! Import refactoring aligns with architectural improvements.The import path change correctly reflects that
ConversationData
is now defined inmodels.responses
as a public response model, improving code organization.src/cache/in_memory_cache.py (1)
4-6
: LGTM! Consistent import refactoring.Import changes align with the architectural refactoring to move
ConversationData
to the responses module.src/models/responses.py (2)
101-112
: LGTM! Well-structured response model.The
ConversationData
model is properly documented with clear field descriptions and appropriate type annotations.
126-128
: No unsafe.doc_title
usages detected; making it optional is safe.src/app/endpoints/conversations_v2.py (1)
243-254
: LGTM! Improved readability and proper additional_kwargs handling.The refactoring enhances code clarity by:
- Using explicit
user_message
andassistant_message
variables instead of inline dictionaries- Conditionally including
additional_kwargs
viamodel_dump()
, which properly serializes the Pydantic model- Maintaining type safety with the
dict[str, Any]
annotation onassistant_message
src/cache/noop_cache.py (1)
4-5
: LGTM! Consistent with cache module refactoring.Import changes align with the architectural refactoring across all cache implementations.
tests/unit/app/endpoints/test_query.py (2)
181-206
: LGTM! Test setup properly reflects new caching architecture.The test updates correctly:
- Mock
store_conversation_into_cache
to expect a singleCacheEntry
object- Create mock
ReferencedDocument
objects with proper structure- Update
retrieve_response
mock to return referenced_documents as the third element
236-245
: LGTM! Comprehensive validation of cached entry structure.The assertions thoroughly validate:
- The cached entry is a
CacheEntry
instance- The
additional_kwargs
field is populated correctly- The nested
referenced_documents
structure contains expected datatests/unit/app/endpoints/test_conversations_v2.py (1)
50-79
: LGTM! Thorough test coverage for additional_kwargs transformation.The new test effectively validates:
- Construction of
CacheEntry
withAdditionalKwargs
andReferencedDocument
objects- Proper inclusion of
additional_kwargs
in the assistant message after transformation- Correct structure and content of nested
referenced_documents
tests/unit/cache/test_sqlite_cache.py (2)
365-392
: LGTM! Validates round-trip serialization of additional_kwargs.The test properly verifies:
CacheEntry
withadditional_kwargs
can be persisted to SQLite- JSON serialization/deserialization preserves the structure
- Retrieved entry matches the original, including nested
referenced_documents
394-411
: LGTM! Ensures backward compatibility.The test confirms that
CacheEntry
objects withoutadditional_kwargs
continue to work correctly, with the field properly stored and retrieved asNone
.
"example": { | ||
"tools": [ | ||
{ | ||
"identifier": "", | ||
"description": "", | ||
"parameters": [ | ||
{ | ||
"name": "", | ||
"description": "", | ||
"parameter_type": "", | ||
"required": "True/False", | ||
"default": "null" | ||
} | ||
], | ||
"provider_id": "", | ||
"toolgroup_id": "", | ||
"server_source": "", | ||
"type": "tool" | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the /v1/tools example types.
The example shows "required": "True/False"
and "default": "null"
, which are strings rather than valid boolean/null values. This trips downstream schema mockers and confuses consumers trying to copy the sample. Please switch those to proper JSON literals (true
/false
, null
).
🤖 Prompt for AI Agents
In docs/openapi.json around lines 142 to 162, the example for /v1/tools uses
string values "True/False" and "null" for the required and default fields;
change "required" to a boolean (true or false) and "default" to the JSON null
literal (null) so the example uses valid JSON literals rather than strings;
update the example instance to show one concrete boolean (e.g., "required":
true) and "default": null to avoid breaking schema mockers and make the sample
copy-pasteable.
docs = [ReferencedDocument(doc_title="Test Doc", doc_url=AnyUrl("http://example.com"))] | ||
kwargs_obj = AdditionalKwargs(referenced_documents=docs) | ||
entry_with_kwargs = CacheEntry( | ||
query="user message", | ||
response="AI message", | ||
provider="foo", model="bar", | ||
started_at="start_time", completed_at="end_time", | ||
additional_kwargs=kwargs_obj | ||
) | ||
|
||
# Call the insert method | ||
cache.insert_or_append(USER_ID_1, CONVERSATION_ID_1, entry_with_kwargs) | ||
|
||
|
||
insert_call = mock_cursor.execute.call_args_list[1] | ||
sql_params = insert_call[0][1] | ||
inserted_json_str = sql_params[-1] | ||
|
||
assert json.loads(inserted_json_str) == { | ||
"referenced_documents": [{"doc_title": "Test Doc", "doc_url": "http://example.com/"}] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid direct AnyUrl
instantiation in tests
Calling AnyUrl("http://example.com")
raises TypeError
because pydantic’s URL types require scheme/host/etc. when instantiated manually. Use a plain string (pydantic will coerce it) so the test can run.
- docs = [ReferencedDocument(doc_title="Test Doc", doc_url=AnyUrl("http://example.com"))]
+ docs = [ReferencedDocument(doc_title="Test Doc", doc_url="http://example.com/")]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
docs = [ReferencedDocument(doc_title="Test Doc", doc_url=AnyUrl("http://example.com"))] | |
kwargs_obj = AdditionalKwargs(referenced_documents=docs) | |
entry_with_kwargs = CacheEntry( | |
query="user message", | |
response="AI message", | |
provider="foo", model="bar", | |
started_at="start_time", completed_at="end_time", | |
additional_kwargs=kwargs_obj | |
) | |
# Call the insert method | |
cache.insert_or_append(USER_ID_1, CONVERSATION_ID_1, entry_with_kwargs) | |
insert_call = mock_cursor.execute.call_args_list[1] | |
sql_params = insert_call[0][1] | |
inserted_json_str = sql_params[-1] | |
assert json.loads(inserted_json_str) == { | |
"referenced_documents": [{"doc_title": "Test Doc", "doc_url": "http://example.com/"}] | |
} | |
docs = [ReferencedDocument(doc_title="Test Doc", doc_url="http://example.com/")] | |
kwargs_obj = AdditionalKwargs(referenced_documents=docs) | |
entry_with_kwargs = CacheEntry( | |
query="user message", | |
response="AI message", | |
provider="foo", model="bar", | |
started_at="start_time", completed_at="end_time", | |
additional_kwargs=kwargs_obj | |
) | |
# Call the insert method | |
cache.insert_or_append(USER_ID_1, CONVERSATION_ID_1, entry_with_kwargs) | |
insert_call = mock_cursor.execute.call_args_list[1] | |
sql_params = insert_call[0][1] | |
inserted_json_str = sql_params[-1] | |
assert json.loads(inserted_json_str) == { | |
"referenced_documents": [{"doc_title": "Test Doc", "doc_url": "http://example.com/"}] | |
} |
🤖 Prompt for AI Agents
In tests/unit/cache/test_postgres_cache.py around lines 395 to 415, the test
incorrectly constructs a pydantic AnyUrl via AnyUrl("http://example.com") which
raises a TypeError; change the referenced document creation to pass a plain
string for doc_url (e.g., "http://example.com") so pydantic will coerce it when
creating the ReferencedDocument, then keep the rest of the test assertions
unchanged.
Description
This PR implements the
referenced_documents
caching for postgres and sqlite. Similar to what we had in road-core/service.v1/query
andv1/streaming_query
saves thereferenced_documents
from the response in the postgres and sqlite db. The cached data can be fetched usingv2/conversations/{conversation id}
The API
v2/conversations/{conversation id}
returns the cached referenced_documents in format (see screenshot in the comment below):I tried to keep it similar to how conversation API was working in road-core/services
Type of change
Related Tickets & Documents
https://issues.redhat.com/browse/RHDHPAI-1143
Checklist before requesting a review
Testing
v1/query
orv1/streaming_query
referenced_documents
usingv2/conversations/{conversation id}
Summary by CodeRabbit
New Features
Improvements
Documentation