Skip to content

Conversation

sobychacko
Copy link
Contributor

Implements prompt caching to reduce costs on repeated content and improve
response times. Applications with large system prompts, extensive tool
definitions, or multi-turn conversations can see significant savings, as
cached content costs ~90% less to process than uncached content.

Adds five caching strategies to address different use cases:

  • SYSTEM_ONLY: Cache system messages (most common - stable instructions)
  • TOOLS_ONLY: Cache tool definitions (when tools are stable but system varies)
  • SYSTEM_AND_TOOLS: Cache both (when both are large and stable)
  • CONVERSATION_HISTORY: Cache conversation history (for chatbots and assistants)
  • NONE: Default, no caching

Implementation:

  • BedrockCacheStrategy enum with BedrockCacheOptions configuration class
  • Integrated with BedrockChatOptions (equals/hashCode/copy support)
  • Cache points applied as separate blocks to satisfy AWS SDK UNION type constraints where each block can only contain one field type
  • Boolean flags derived from strategy to improve code readability and avoid repetitive conditional checks throughout request building
  • Last user message pattern for CONVERSATION_HISTORY enables incremental caching where each turn builds on the previous cached prefix
  • Cache metrics exposed via metadata Map to maintain provider independence without adding Bedrock-specific fields to shared interfaces
  • Cache hierarchy respects AWS cascade invalidation (tools → system → messages) to prevent stale cache combinations
  • Debug logging for troubleshooting cache point application

Model compatibility:

  • Claude 3.x/4.x: All strategies supported
  • Amazon Nova: SYSTEM_ONLY and CONVERSATION_HISTORY only (AWS limitation on tool caching for Nova models)

Testing:

  • Integration tests for all strategies using Claude 3.7 Sonnet
  • Tests handle cache TTL overlap between runs to avoid flakiness in CI

Documentation includes usage examples, real-world use cases (legal document
analysis, code review, customer support, multi-tenant SaaS), best practices,
cache invalidation behavior, and cost considerations. Break-even occurs after
one cache hit since cache reads cost ~90% less than base input tokens while
cache writes cost ~25% more.

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

  • Add a Signed-off-by line to each commit (git commit -s) per the DCO
  • Rebase your changes on the latest main branch and squash your commits
  • Add/Update unit tests as needed
  • Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

  Implements prompt caching to reduce costs on repeated content and improve
  response times. Applications with large system prompts, extensive tool
  definitions, or multi-turn conversations can see significant savings, as
  cached content costs ~90% less to process than uncached content.

  Adds five caching strategies to address different use cases:
  - `SYSTEM_ONLY`: Cache system messages (most common - stable instructions)
  - `TOOLS_ONLY`: Cache tool definitions (when tools are stable but system varies)
  - `SYSTEM_AND_TOOLS`: Cache both (when both are large and stable)
  - `CONVERSATION_HISTORY`: Cache conversation history (for chatbots and assistants)
  - `NONE`: Default, no caching

  Implementation:
  - `BedrockCacheStrategy` enum with `BedrockCacheOptions` configuration class
  - Integrated with `BedrockChatOptions` (`equals`/`hashCode`/`copy` support)
  - Cache points applied as separate blocks to satisfy AWS SDK UNION type
    constraints where each block can only contain one field type
  - Boolean flags derived from strategy to improve code readability and avoid
    repetitive conditional checks throughout request building
  - Last user message pattern for `CONVERSATION_HISTORY` enables incremental
    caching where each turn builds on the previous cached prefix
  - Cache metrics exposed via metadata `Map` to maintain provider independence
    without adding Bedrock-specific fields to shared interfaces
  - Cache hierarchy respects AWS cascade invalidation (tools → system → messages)
    to prevent stale cache combinations
  - Debug logging for troubleshooting cache point application

  Model compatibility:
  - Claude 3.x/4.x: All strategies supported
  - Amazon Nova: `SYSTEM_ONLY` and `CONVERSATION_HISTORY` only (AWS limitation on
    tool caching for Nova models)

  Testing:
  - Integration tests for all strategies using Claude 3.7 Sonnet
  - Tests handle cache TTL overlap between runs to avoid flakiness in CI

  Documentation includes usage examples, real-world use cases (legal document
  analysis, code review, customer support, multi-tenant SaaS), best practices,
  cache invalidation behavior, and cost considerations. Break-even occurs after
  one cache hit since cache reads cost ~90% less than base input tokens while
  cache writes cost ~25% more.

Signed-off-by: Soby Chacko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants