-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Description
Is your feature request related to a problem? Please describe.
Batch predictions should be able to use a cached context.
Describe the solution you'd like
We are currently working on a classifying prompt which requires an extensive system prompt. We have been experimenting using batch processing together with context caching (leveraging the 50% discount from batch processing, with the 75% discount from context caching).
We have tried several approaches now, but the batch job fails. Here's and entry from the responding predictions.jsonl
:
{"status":"Internal error occurred. Failed to get generateContentResponse: {\"error\": {\"code\": 404, \"message\": \"Not found: cached content metadata for 3814716010150232064.\", \"status\": \"NOT_FOUND\"}}"
- There is no official Google material (tutorials, documentation, model cards, etc) stating if batch predicitons and context caching is supported when used together.
- If it is supported, it would be lovely to see a tutorial.
- If it is NOT supported, please add a disclaimer to the batch / context cache docs.
Describe alternatives you've considered
It is supported on OpenAI.
Additional context
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
kazaaz
Metadata
Metadata
Assignees
Labels
No labels