diff --git a/docs/deployments/apis.md b/docs/deployments/apis.md index aa0159ab31..1eed87bdf1 100644 --- a/docs/deployments/apis.md +++ b/docs/deployments/apis.md @@ -10,9 +10,9 @@ Serve models at scale. model: # path to an exported model (e.g. s3://my-bucket/exported_model) model_format: # model format, must be "tensorflow" or "onnx" (default: "onnx" if model path ends with .onnx, "tensorflow" if model path ends with .zip or is a directory) request_handler: # path to the request handler implementation file, relative to the cortex root - tf_signature_key: # name of the signature def to use for prediction (required if your model has more than one signature def) + tf_signature_key: # name of the signature def to use for prediction (required if your model has more than one signature def) tracker: - key: # json key to track if the response payload is a dictionary + key: # key to track (required if the response payload is a JSON object) model_type: # model type, must be "classification" or "regression" compute: min_replicas: # minimum number of replicas (default: 1) @@ -43,6 +43,10 @@ Request handlers are used to decouple the interface of an API endpoint from its See [request handlers](request-handlers.md) for a detailed guide. +## Prediction Monitoring + +`tracker` can be configured to collect API prediction metrics and display real-time stats in `cortex get `. The tracker looks for scalar values in the response payload (after the execution of the `post_inference` request handler, if provided). If the response payload is a JSON object, `key` can be set to extract the desired scalar value. For regression models, the tracker should be configured with `model_type: regression` to collect float values and display regression stats such as min, max and average. For classification models, the tracker should be configured with `model_type: classification` to collect integer or string values and display the class distribution. + ## Debugging You can log more information about each request by adding a `?debug=true` parameter to your requests. This will print: @@ -52,10 +56,10 @@ You can log more information about each request by adding a `?debug=true` parame 3. The value after running inference 4. The value after running the `post_inference` function (if applicable) -## Autoscaling replicas +## Autoscaling Replicas Cortex adjusts the number of replicas that are serving predictions by monitoring the compute resource usage of each API. The number of replicas will be at least `min_replicas` and no more than `max_replicas`. -## Autoscaling nodes +## Autoscaling Nodes Cortex spins up and down nodes based on the aggregate resource requests of all APIs. The number of nodes will be at least `$CORTEX_NODES_MIN` and no more than `$CORTEX_NODES_MAX` (configured during installation and modifiable via the [AWS console](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-scaling.html)). diff --git a/examples/iris-classifier/cortex.yaml b/examples/iris-classifier/cortex.yaml index e59a2f6c21..dcdce7b57f 100644 --- a/examples/iris-classifier/cortex.yaml +++ b/examples/iris-classifier/cortex.yaml @@ -5,23 +5,33 @@ name: tensorflow model: s3://cortex-examples/iris/tensorflow request_handler: handlers/tensorflow.py + tracker: + model_type: classification - kind: api name: pytorch model: s3://cortex-examples/iris/pytorch.onnx request_handler: handlers/pytorch.py + tracker: + model_type: classification - kind: api name: keras model: s3://cortex-examples/iris/keras.onnx request_handler: handlers/keras.py + tracker: + model_type: classification - kind: api name: xgboost model: s3://cortex-examples/iris/xgboost.onnx request_handler: handlers/xgboost.py + tracker: + model_type: classification - kind: api name: sklearn model: s3://cortex-examples/iris/sklearn.onnx request_handler: handlers/sklearn.py + tracker: + model_type: classification