-
Notifications
You must be signed in to change notification settings - Fork 607
Closed
Labels
enhancementNew feature or requestNew feature or requestresearchDetermine technical constraintsDetermine technical constraints
Description
It would be really useful especially for smaller applications to be able to scale GPU's down to 0 when there is no traffic.
Possible approach
- To trigger scaling 1 -> 0, check CloudWatch metrics for no requests for a certain amount of time (user-configurable?).
- Scale 1 -> 0 by setting
deployment.spec.replicas
to 0. - When scaling 1 -> 0, also update the Istio Virtual Service to route requests to that API to a new deployment running in the Cortex node (or use the existing operator)
- 0 -> 1 scaling is triggered when a request comes in to that service
- Scale 0 -> 1 by setting deployment.spec.replicas to 0
- Either the service holds onto the request until the pod is ready, forwards it, and replies with the response, or responds immediately with a message saying e.g. "0 -> 1 scaling has been triggered, please try again in a few minutes"
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestresearchDetermine technical constraintsDetermine technical constraints