Add prompt caching support for AWS Bedrock Converse API #4614

sobychacko · 2025-10-11T20:59:22Z

Implements prompt caching to reduce costs on repeated content and improve
response times. Applications with large system prompts, extensive tool
definitions, or multi-turn conversations can see significant savings, as
cached content costs ~90% less to process than uncached content.

Adds five caching strategies to address different use cases:

SYSTEM_ONLY: Cache system messages (most common - stable instructions)
TOOLS_ONLY: Cache tool definitions (when tools are stable but system varies)
SYSTEM_AND_TOOLS: Cache both (when both are large and stable)
CONVERSATION_HISTORY: Cache conversation history (for chatbots and assistants)
NONE: Default, no caching

Implementation:

BedrockCacheStrategy enum with BedrockCacheOptions configuration class
Integrated with BedrockChatOptions (equals/hashCode/copy support)
Cache points applied as separate blocks to satisfy AWS SDK UNION type constraints where each block can only contain one field type
Boolean flags derived from strategy to improve code readability and avoid repetitive conditional checks throughout request building
Last user message pattern for CONVERSATION_HISTORY enables incremental caching where each turn builds on the previous cached prefix
Cache metrics exposed via metadata Map to maintain provider independence without adding Bedrock-specific fields to shared interfaces
Cache hierarchy respects AWS cascade invalidation (tools → system → messages) to prevent stale cache combinations
Debug logging for troubleshooting cache point application

Model compatibility:

Claude 3.x/4.x: All strategies supported
Amazon Nova: SYSTEM_ONLY and CONVERSATION_HISTORY only (AWS limitation on tool caching for Nova models)

Testing:

Integration tests for all strategies using Claude 3.7 Sonnet
Tests handle cache TTL overlap between runs to avoid flakiness in CI

Documentation includes usage examples, real-world use cases (legal document
analysis, code review, customer support, multi-tenant SaaS), best practices,
cache invalidation behavior, and cost considerations. Break-even occurs after
one cache hit since cache reads cost ~90% less than base input tokens while
cache writes cost ~25% more.

Thank you for taking time to contribute this pull request!
You might have already read the contributor guide, but as a reminder, please make sure to:

Add a Signed-off-by line to each commit (git commit -s) per the DCO
Rebase your changes on the latest main branch and squash your commits
Add/Update unit tests as needed
Run a build and make sure all tests pass prior to submission

For more details, please check the contributor guide.
Thank you upfront!

Implements prompt caching to reduce costs on repeated content and improve response times. Applications with large system prompts, extensive tool definitions, or multi-turn conversations can see significant savings, as cached content costs ~90% less to process than uncached content. Adds five caching strategies to address different use cases: - `SYSTEM_ONLY`: Cache system messages (most common - stable instructions) - `TOOLS_ONLY`: Cache tool definitions (when tools are stable but system varies) - `SYSTEM_AND_TOOLS`: Cache both (when both are large and stable) - `CONVERSATION_HISTORY`: Cache conversation history (for chatbots and assistants) - `NONE`: Default, no caching Implementation: - `BedrockCacheStrategy` enum with `BedrockCacheOptions` configuration class - Integrated with `BedrockChatOptions` (`equals`/`hashCode`/`copy` support) - Cache points applied as separate blocks to satisfy AWS SDK UNION type constraints where each block can only contain one field type - Boolean flags derived from strategy to improve code readability and avoid repetitive conditional checks throughout request building - Last user message pattern for `CONVERSATION_HISTORY` enables incremental caching where each turn builds on the previous cached prefix - Cache metrics exposed via metadata `Map` to maintain provider independence without adding Bedrock-specific fields to shared interfaces - Cache hierarchy respects AWS cascade invalidation (tools → system → messages) to prevent stale cache combinations - Debug logging for troubleshooting cache point application Model compatibility: - Claude 3.x/4.x: All strategies supported - Amazon Nova: `SYSTEM_ONLY` and `CONVERSATION_HISTORY` only (AWS limitation on tool caching for Nova models) Testing: - Integration tests for all strategies using Claude 3.7 Sonnet - Tests handle cache TTL overlap between runs to avoid flakiness in CI Documentation includes usage examples, real-world use cases (legal document analysis, code review, customer support, multi-tenant SaaS), best practices, cache invalidation behavior, and cost considerations. Break-even occurs after one cache hit since cache reads cost ~90% less than base input tokens while cache writes cost ~25% more. Signed-off-by: Soby Chacko <[email protected]>

ilayaperumalg added anthropic Bedrock and removed anthropic labels Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add prompt caching support for AWS Bedrock Converse API #4614

Add prompt caching support for AWS Bedrock Converse API #4614

Uh oh!

sobychacko commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add prompt caching support for AWS Bedrock Converse API #4614

Are you sure you want to change the base?

Add prompt caching support for AWS Bedrock Converse API #4614

Uh oh!

Conversation

sobychacko commented Oct 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants