[RHDHPAI-1143] Implement referenced_documents caching #643

maysunfaisal · 2025-10-08T22:41:22Z

Description

This PR implements the referenced_documents caching for postgres and sqlite. Similar to what we had in road-core/service.

v1/query and v1/streaming_query saves the referenced_documents from the response in the postgres and sqlite db. The cached data can be fetched using v2/conversations/{conversation id}

The API v2/conversations/{conversation id} returns the cached referenced_documents in format (see screenshot in the comment below):

"additional_kwargs": {
    "referenced_documents": [
        {
            "doc_url": "https://docs.redhat.com/en/documentation/red_hat_developer_hub/1.7/html-single/customizing_red_hat_developer_hub/index",
            "doc_title": "Customizing Red Hat Developer Hub"
        },
        {
            "doc_url": "https://docs.redhat.com/en/documentation/red_hat_developer_hub/1.7/html-single/interacting_with_red_hat_developer_lightspeed_for_red_hat_developer_hub/index",
            "doc_title": "Interacting with Red Hat Developer Lightspeed for Red Hat Developer Hub"
        },
        {
            "doc_url": "https://docs.redhat.com/en/documentation/red_hat_developer_hub/1.7/html-single/red_hat_developer_hub_release_notes/index",
            "doc_title": "Red Hat Developer Hub release notes"
        }
    ]
}

I tried to keep it similar to how conversation API was working in road-core/services

Type of change

Related Tickets & Documents

Related Issue #
https://issues.redhat.com/browse/RHDHPAI-1143

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Enable builtin::rag in your llama stack and point to RAG index
Send a query to either v1/query or v1/streaming_query
Fetch the referenced_documents using v2/conversations/{conversation id}

Summary by CodeRabbit

New Features
- Added /v1/tools endpoint to retrieve a consolidated list of tools.
- Introduced BYOK RAG configuration support in the public API.
Improvements
- Assistant responses may include optional referenced documents metadata.
- ReferencedDocument titles can now be null.
- Added get_tools to the public actions list.
Documentation
- OpenAPI updated to include the new endpoint, schemas, and enum changes.

Signed-off-by: Maysun J Faisal <[email protected]>

coderabbitai · 2025-10-08T22:41:54Z

Walkthrough

Adds a GET /v1/tools endpoint and related OpenAPI schemas (ToolsResponse, ByokRag, Action.get_tools). Refactors caching to use a unified CacheEntry with optional AdditionalKwargs (referenced_documents), updating endpoints, utils, and cache backends (SQLite/Postgres schemas). Moves ConversationData to models.responses. Updates message transformation to include optional additional_kwargs. Tests adjusted accordingly.

Changes

Cohort / File(s)	Summary
OpenAPI schemas and endpoint `docs/openapi.json`	Adds /v1/tools (GET). Introduces ToolsResponse, ByokRag, extends Action with get_tools. Updates Configuration schema. Makes ReferencedDocument.doc_title nullable and not required.
Endpoints: message transform and caching call sites `src/app/endpoints/conversations_v2.py`, `src/app/endpoints/query.py`, `src/app/endpoints/streaming_query.py`	conversations_v2: transform_chat_message now constructs user/assistant messages explicitly and includes optional additional_kwargs for assistant. query/streaming_query: construct CacheEntry (with optional AdditionalKwargs/referenced_documents) and pass as single arg to store_conversation_into_cache; import AnyUrl and ReferencedDocument for types.
Utils: cache storage API `src/utils/endpoints.py`	Changes store_conversation_into_cache signature to accept a CacheEntry object; removes internal CacheEntry construction; delegates insert to cache backend unchanged otherwise.
Models: cache and response types `src/models/cache_entry.py`, `src/models/responses.py`	Adds AdditionalKwargs (referenced_documents: List[ReferencedDocument]). CacheEntry gains optional additional_kwargs. Moves ConversationData to models.responses. Makes ReferencedDocument.doc_title optional.
Cache backends and interfaces `src/cache/sqlite_cache.py`, `src/cache/postgres_cache.py`, `src/cache/cache.py`, `src/cache/in_memory_cache.py`, `src/cache/noop_cache.py`	SQLite/Postgres: add additional_kwargs column/JSONB, update CREATE/SELECT/INSERT, serialize/deserialize AdditionalKwargs. Update imports to use ConversationData from models.responses. Noop/in-memory/cache module import path adjustments only.
Tests: endpoints `tests/unit/app/endpoints/test_conversations_v2.py`, `tests/unit/app/endpoints/test_query.py`, `tests/unit/app/endpoints/test_streaming_query.py`	Extend tests to validate additional_kwargs propagation (referenced_documents) in messages and that CacheEntry is passed to store_conversation_into_cache with correct fields.
Tests: cache backends `tests/unit/cache/test_sqlite_cache.py`, `tests/unit/cache/test_postgres_cache.py`	Add round-trip tests for additional_kwargs JSON storage/retrieval; adjust imports for ConversationData move; validate ReferencedDocument titles/URLs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant API as API Endpoint
  participant LLM as LLM Provider
  participant Cache as Cache Backend

  rect rgb(245,248,255)
    note over API: Query/Streaming flow (new CacheEntry path)
    User->>API: Send query
    API->>LLM: Request completion/stream
    LLM-->>API: Response (+metadata)
    API->>API: Build ReferencedDocument list (if any)
    API->>API: Create AdditionalKwargs (optional)
    API->>API: Create CacheEntry {query,response,provider,model,timestamps,additional_kwargs}
    API->>Cache: insert_or_append(CacheEntry)
    Cache-->>API: Ack
    API-->>User: Return response (+end-of-stream if streaming)
  end

sequenceDiagram
  autonumber
  actor Client
  participant API as API
  participant MCP as MCP Servers

  rect rgb(245,255,245)
    note over API: /v1/tools (new)
    Client->>API: GET /v1/tools
    API->>MCP: Fetch tools from configured servers
    MCP-->>API: Tools lists
    API-->>Client: 200 ToolsResponse { tools: [...] }
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[RHDHPAI-978] Topic summary of initial query #564 — Also refactors cache API and related models, overlapping with CacheEntry/ConversationData changes and cache storage calls.
LCORE-632: updated OpenAPI Swagger #647 — Introduces the same OpenAPI additions (ByokRag, ToolsResponse, /v1/tools, Action.get_tools), indicating shared API surface changes.
LCORE-202: Add GET /v1/tools endpoint to list tools from MCP servers #626 — Adds GET /v1/tools and ToolsResponse, directly touching the same endpoint and schema.

Suggested reviewers

tisnik
manstis

Poem

I thump my paw—new tools appear,
A cache burrows deeper, crystal-clear.
Documents referenced, neatly packed,
In CacheEntry’s cozy, JSON stack.
Streams now settle, carrots aligned—
Ship it quick, with hops refined! 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly describes the main feature added in this changeset—caching of referenced_documents—directly reflecting the PR objectives and the code changes across the query, streaming, and conversation endpoints while remaining clear and focused.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Maysun J Faisal <[email protected]>

maysunfaisal · 2025-10-09T22:15:24Z

Here is an example of how cached referenced_documents are fetched by v2/conversations/{conversation id}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cache/postgres_cache.py (1)
41-81: Handle schema migration for additional_kwargs column.

In existing deployments the cache table already exists without this column, so the new SELECT/INSERT paths will immediately raise column "additional_kwargs" does not exist. Please extend initialize_cache() to ALTER TABLE (or otherwise migrate) before we read/write the column.
         logger.info("Initializing table for cache")
         cursor.execute(PostgresCache.CREATE_CACHE_TABLE)

+        logger.info("Ensuring additional_kwargs column exists")
+        cursor.execute(
+            "ALTER TABLE cache ADD COLUMN IF NOT EXISTS additional_kwargs jsonb"
+        )
+
src/cache/sqlite_cache.py (1)
45-86: Add migration for SQLite additional_kwargs column.

Production environments already have a cache table without this column; the new SELECT will crash with OperationalError: no such column: additional_kwargs. Please teach initialize_cache() to add the column when missing (e.g., check PRAGMA table_info('cache') and issue ALTER TABLE cache ADD COLUMN additional_kwargs TEXT).
         logger.info("Initializing table for cache")
         cursor.execute(SQLiteCache.CREATE_CACHE_TABLE)

+        logger.info("Ensuring additional_kwargs column exists")
+        existing_cols = {
+            row[1] for row in cursor.execute("PRAGMA table_info('cache')")
+        }
+        if "additional_kwargs" not in existing_cols:
+            cursor.execute("ALTER TABLE cache ADD COLUMN additional_kwargs TEXT")
+

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44435d2 and 856f886.

📒 Files selected for processing (17)

docs/openapi.json (6 hunks)
src/app/endpoints/conversations_v2.py (1 hunks)
src/app/endpoints/query.py (3 hunks)
src/app/endpoints/streaming_query.py (3 hunks)
src/cache/cache.py (1 hunks)
src/cache/in_memory_cache.py (1 hunks)
src/cache/noop_cache.py (1 hunks)
src/cache/postgres_cache.py (6 hunks)
src/cache/sqlite_cache.py (6 hunks)
src/models/cache_entry.py (3 hunks)
src/models/responses.py (2 hunks)
src/utils/endpoints.py (1 hunks)
tests/unit/app/endpoints/test_conversations_v2.py (2 hunks)
tests/unit/app/endpoints/test_query.py (4 hunks)
tests/unit/app/endpoints/test_streaming_query.py (3 hunks)
tests/unit/cache/test_postgres_cache.py (2 hunks)
tests/unit/cache/test_sqlite_cache.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (9)

src/**/*.py