diff --git a/README.md b/README.md index b387a1ebd3..46ecc7c7c9 100644 --- a/README.md +++ b/README.md @@ -4,9 +4,7 @@ [Website](https://www.cortex.dev) • [Slack](https://community.cortex.dev) • [Docs](https://docs.cortex.dev) -
- -# Deploy, manage, and scale machine learning models in production +# Deploy machine learning models to production Cortex is a cloud native model serving platform for machine learning engineering teams. @@ -36,6 +34,8 @@ $ cortex deploy apis.yaml all APIs are ready! ``` +
+ ## Manage * Create A/B tests and shadow pipelines with configurable traffic splitting. @@ -51,11 +51,13 @@ image-classifier batch 64 video-analyzer async 16 ``` +
+ ## Scale * Configure workload and cluster autoscaling to efficiently handle large-scale production workloads. * Create clusters with different types of instances for different types of workloads. -* Spend less on cloud infrastructure by letting Cortex manage spot or preemptible instances. +* Spend less on cloud infrastructure by letting Cortex manage spot instances. ```text $ cortex cluster info diff --git a/dev/generate_cli_md.sh b/dev/generate_cli_md.sh index 7cc386e1e9..da462d92f7 100755 --- a/dev/generate_cli_md.sh +++ b/dev/generate_cli_md.sh @@ -42,9 +42,6 @@ commands=( "cluster configure" "cluster down" "cluster export" - "cluster-gcp up" - "cluster-gcp info" - "cluster-gcp down" "env configure" "env list" "env default" diff --git a/docs/README.md b/docs/README.md index 5160cdc8e2..8d172a3471 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1 +1 @@ -**Please view our documentation at [docs.cortex.dev](https://docs.cortex.dev)** +**Please read our documentation at [docs.cortex.dev](https://docs.cortex.dev)** diff --git a/docs/clients/cli.md b/docs/clients/cli.md index b96c2262b1..507d669ebe 100644 --- a/docs/clients/cli.md +++ b/docs/clients/cli.md @@ -12,7 +12,7 @@ Flags: -e, --env string environment to use -f, --force override the in-progress api update -y, --yes skip prompts - -o, --output string output format: one of pretty|json (default "pretty") + -o, --output string output format: one of json (default "pretty") -h, --help help for deploy ``` @@ -27,7 +27,7 @@ Usage: Flags: -e, --env string environment to use -w, --watch re-run the command every 2 seconds - -o, --output string output format: one of pretty|json (default "pretty") + -o, --output string output format: one of json (default "pretty") -v, --verbose show additional information (only applies to pretty output format) -h, --help help for get ``` @@ -57,7 +57,7 @@ Usage: Flags: -e, --env string environment to use -f, --force override the in-progress api update - -o, --output string output format: one of pretty|json (default "pretty") + -o, --output string output format: one of json (default "pretty") -h, --help help for patch ``` @@ -72,7 +72,7 @@ Usage: Flags: -e, --env string environment to use -f, --force override the in-progress api update - -o, --output string output format: one of pretty|json (default "pretty") + -o, --output string output format: one of json (default "pretty") -h, --help help for refresh ``` @@ -88,7 +88,7 @@ Flags: -e, --env string environment to use -f, --force delete the api without confirmation -c, --keep-cache keep cached data for the api - -o, --output string output format: one of pretty|json (default "pretty") + -o, --output string output format: one of json (default "pretty") -h, --help help for delete ``` @@ -170,57 +170,6 @@ Flags: -h, --help help for export ``` -## cluster-gcp up - -```text -spin up a cluster on gcp - -Usage: - cortex cluster-gcp up [CLUSTER_CONFIG_FILE] [flags] - -Flags: - -e, --configure-env string name of environment to configure (default "gcp") - -y, --yes skip prompts - -h, --help help for up -``` - -## cluster-gcp info - -```text -get information about a cluster - -Usage: - cortex cluster-gcp info [flags] - -Flags: - -c, --config string path to a cluster configuration file - -n, --name string name of the cluster - -p, --project string gcp project id - -z, --zone string gcp zone of the cluster - -e, --configure-env string name of environment to configure - -d, --debug save the current cluster state to a file - -y, --yes skip prompts - -h, --help help for info -``` - -## cluster-gcp down - -```text -spin down a cluster - -Usage: - cortex cluster-gcp down [flags] - -Flags: - -c, --config string path to a cluster configuration file - -n, --name string name of the cluster - -p, --project string gcp project id - -z, --zone string gcp zone of the cluster - -y, --yes skip prompts - --keep-volumes keep cortex provisioned persistent volumes - -h, --help help for down -``` - ## env configure ```text @@ -243,7 +192,7 @@ Usage: cortex env list [flags] Flags: - -o, --output string output format: one of pretty|json (default "pretty") + -o, --output string output format: one of json (default "pretty") -h, --help help for list ``` diff --git a/docs/clients/install.md b/docs/clients/install.md index c5ac2cde8d..d93bc70f4a 100644 --- a/docs/clients/install.md +++ b/docs/clients/install.md @@ -33,4 +33,4 @@ By default, the Cortex CLI is installed at `/usr/local/bin/cortex`. To install t ## Changing the CLI/client configuration directory -By default, the Cortex CLI/client creates a directory at `~/.cortex/` and uses it to store environment configuration. To use a different directory, export the `CORTEX_CLI_CONFIG_DIR` environment variable before running any `cortex` commands. +By default, the CLI/client creates a directory at `~/.cortex/` and uses it to store environment configuration. To use a different directory, export the `CORTEX_CLI_CONFIG_DIR` environment variable before running any `cortex` commands. diff --git a/docs/clients/python.md b/docs/clients/python.md index bcb6491fe2..e86ba82a80 100644 --- a/docs/clients/python.md +++ b/docs/clients/python.md @@ -1,4 +1,4 @@ -# Python API +# Python client * [cortex](#cortex) * [client](#client) @@ -25,8 +25,7 @@ client(env: str = None) -> Client ``` -Initialize a client based on the specified environment. -If no environment name is passed, it will attempt using the default environment. +Initialize a client based on the specified environment. If no environment is specified, it will attempt to use the default environment. **Arguments**: @@ -43,17 +42,17 @@ If no environment name is passed, it will attempt using the default environment. new_client(name: str, operator_endpoint: str) -> Client ``` -Create a new environment to connect to an existing Cortex Cluster, and initialize a client to deploy and manage APIs on that cluster. +Create a new environment to connect to an existing cluster, and initialize a client to deploy and manage APIs on that cluster. **Arguments**: - `name` - Name of the environment to create. -- `operator_endpoint` - The endpoint for the operator of your Cortex Cluster. You can get this endpoint by running the CLI command `cortex cluster info` for an AWS provider or `cortex cluster-gcp info` for a GCP provider. +- `operator_endpoint` - The endpoint for the operator of your Cortex cluster. You can get this endpoint by running the CLI command `cortex cluster info`. **Returns**: - Cortex client that can be used to deploy and manage APIs on a Cortex Cluster. + Cortex client that can be used to deploy and manage APIs on a cluster. ## env\_list diff --git a/docs/clusters/aws/kubectl.md b/docs/clusters/advanced/kubectl.md similarity index 82% rename from docs/clusters/aws/kubectl.md rename to docs/clusters/advanced/kubectl.md index d8ea309493..51f81860ff 100644 --- a/docs/clusters/aws/kubectl.md +++ b/docs/clusters/advanced/kubectl.md @@ -1,6 +1,6 @@ -# Setting up kubectl +# Setting up `kubectl` -## Install kubectl +## Install `kubectl` Follow these [instructions](https://kubernetes.io/docs/tasks/tools/install-kubectl). @@ -16,13 +16,13 @@ aws --version # should be >= 1.16 aws configure ``` -## Update kubeconfig +## Update `kubeconfig` ```bash aws eks update-kubeconfig --name= --region= ``` -## Test kubectl +## Test `kubectl` ```bash kubectl get pods diff --git a/docs/clusters/registry.md b/docs/clusters/advanced/registry.md similarity index 84% rename from docs/clusters/registry.md rename to docs/clusters/advanced/registry.md index 13e9758894..3f5de491e2 100644 --- a/docs/clusters/registry.md +++ b/docs/clusters/advanced/registry.md @@ -1,8 +1,8 @@ # Private Docker registry -## Install and configure kubectl +## Configuring `kubectl` -Follow the instructions for [AWS](aws/kubectl.md) or [GCP](gcp/kubectl.md). +Follow the instructions [here](kubectl.md). ## Setting credentials diff --git a/docs/clusters/aws/multi-instance-type.md b/docs/clusters/aws/multi-instance-type.md deleted file mode 100644 index 43c56c5b4f..0000000000 --- a/docs/clusters/aws/multi-instance-type.md +++ /dev/null @@ -1,81 +0,0 @@ -# Multi-instance type clusters - -The cluster can be configured to provision different instance types depending on what resources the APIs request. The multi instance type cluster has the following advantages over the single-instance type cluster: - -* **Lower costs**: Reduced overall compute costs by using the most economical instance for the given workloads. -* **Simpler logistics**: Managing multiple clusters on your own is no longer required. -* **Multi-purpose cluster**: The cluster can now take any range of workloads. One cluster for everything. Just throw a bunch of node groups in the cluster config, and you’re set. - -## Best practices - -When specifying the node groups in your `cluster.yaml` config, keep in mind that node groups with lower indexes have a higher priority over the other ones. With that mind, the best practices that result from this are: - -1. Node groups with smaller instances should have the higher priority. -1. Node groups with CPU-only instances should come before the node groups equipped with GPU/Inferentia instances. -1. The spot node groups should always come first over the ones that have on-demand instances. - -## Example node groups - -### CPU spot/on-demand with GPU on-demand - -```yaml -# cluster.yaml - -node_groups: - - name: cpu-spot - instance_type: m5.large - spot: true - - name: cpu - instance_type: m5.large - - name: gpu - instance_type: g4dn.xlarge -``` - -### CPU on-demand, GPU on-demand and Inferentia on-demand - -```yaml -# cluster.yaml - -node_groups: - - name: cpu - instance_type: m5.large - - name: gpu - instance_type: g4dn.xlarge - - name: inferentia - instance_type: inf.xlarge -``` - -### 3 spot CPU node groups with 1 on-demand CPU - -```yaml -# cluster.yaml - -node_groups: - - name: cpu-0 - instance_type: t3.medium - spot: true - - name: cpu-1 - instance_type: m5.2xlarge - spot: true - - name: cpu-2 - instance_type: m5.8xlarge - spot: true - - name: cpu-3 - instance_type: m5.24xlarge -``` - -The above can also be achieved with the following config. - -```yaml -# cluster.yaml - -node_groups: - - name: cpu-0 - instance_type: t3.medium - spot: true - spot_config: - instance_distribution: [m5.2xlarge, m5.8xlarge] - max_price: 3.27 - - name: cpu-1 - instance_type: m5.24xlarge -``` diff --git a/docs/clusters/aws/security.md b/docs/clusters/aws/security.md deleted file mode 100644 index fe539d5185..0000000000 --- a/docs/clusters/aws/security.md +++ /dev/null @@ -1,13 +0,0 @@ -# Security - -## Private cluster subnets - -By default, instances are created in public subnets and are assigned public IP addresses. You can configure all instances in your cluster to use private subnets by setting `subnet_visibility: private` in your [cluster configuration](install.md) file before creating your cluster. If private subnets are used, instances will not have public IP addresses, and Cortex will create a NAT gateway to allow outgoing network requests. - -## Private APIs - -See [networking](networking/index.md) for a discussion of API visibility. - -## Private operator - -By default, the Cortex cluster operator's load balancer is internet-facing, and therefore publicly accessible (the operator is what the `cortex` CLI connects to). The operator's load balancer can be configured to be private by setting `operator_load_balancer_scheme: internal` in your [cluster configuration](install.md) file. If you do this, you will need to configure [VPC Peering](networking/vpc-peering.md) to allow your CLI to connect to the Cortex operator (this will be necessary to run any `cortex` commands). diff --git a/docs/clusters/gcp/credentials.md b/docs/clusters/gcp/credentials.md deleted file mode 100644 index d3cf4f6e11..0000000000 --- a/docs/clusters/gcp/credentials.md +++ /dev/null @@ -1,10 +0,0 @@ -# Credentials - -1. Create a service account for your GCP project as described [here](https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console) with the following roles (these roles could be more restrictive if required): - 1. `Editor` role. - 1. `Kubernetes Engine Admin` role. - 1. `Container Registry Service Agent` role. - 1. `Storage Admin` role. - 1. `Storage Object Admin` role. -1. Generate a service account key for your service account as described [here](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and export it as a JSON file. -1. Export the `GOOGLE_APPLICATION_CREDENTIALS` variable and point it to the downloaded service account key from the previous step. For example: `export GOOGLE_APPLICATION_CREDENTIALS=/home/ubuntu/.config/gcloud/sample-269400-9a41792a969b.json` diff --git a/docs/clusters/gcp/install.md b/docs/clusters/gcp/install.md deleted file mode 100644 index 13d7c459a9..0000000000 --- a/docs/clusters/gcp/install.md +++ /dev/null @@ -1,88 +0,0 @@ -# Install - -## Prerequisites - -1. [Docker](https://docs.docker.com/install) must be installed and running on your machine (to verify, check that running `docker ps` does not return an error) -1. You may need to [request a quota increase](https://cloud.google.com/compute/quotas) for your desired instance type and/or GPU type -1. Export the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, containing the path to your GCP credentials file (e.g. `export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/myproject-8a41417a968a.json`) -1. If you haven't done so already, enable the Kubernetes Engine API in your GCP project ([here](https://console.developers.google.com/apis/api/container.googleapis.com/overview)) - -## Spin up Cortex on your GCP account - -```bash -# install the CLI -pip install cortex - -# spin up Cortex on your GCP account -cortex cluster-gcp up cluster.yaml # (see configuration options below) -``` - -## Configure Cortex - -```yaml -# cluster.yaml - -# GKE cluster name -cluster_name: cortex - -# GCP project ID -project: - -# GCP zone for your cluster -zone: us-east1-c - -# list of cluster node pools; the smaller index, the higher the priority of the node pool -node_pools: - - name: np-cpu # name of the node pool - instance_type: n1-standard-2 # instance type - # accelerator_type: nvidia-tesla-t4 # GPU to attach to your instance (optional) - # accelerators_per_instance: 1 # the number of GPUs to attach to each instance (optional) - min_instances: 1 # minimum number of instances - max_instances: 5 # maximum number of instances - preemptible: false # enable the use of preemptible instances - - - name: np-gpu - instance_type: n1-standard-2 - accelerator_type: nvidia-tesla-t4 - accelerators_per_instance: 1 - min_instances: 1 - max_instances: 5 - preemptible: false - ... - -# the name of the network in which to create your cluster -# network: default - -# the name of the subnetwork in which to create your cluster -# subnet: default - -# API load balancer scheme [internet-facing | internal] -api_load_balancer_scheme: internet-facing - -# operator load balancer scheme [internet-facing | internal] -# note: if using "internal", you must be within the cluster's VPC or configure VPC Peering to connect your CLI to your cluster operator -operator_load_balancer_scheme: internet-facing -``` - -The docker images used by the Cortex cluster can also be overridden, although this is not common. They can be configured by adding any of these keys to your cluster configuration file (default values are shown): - - -```yaml -image_operator: quay.io/cortexlabs/operator:master -image_manager: quay.io/cortexlabs/manager:master -image_downloader: quay.io/cortexlabs/downloader:master -image_request_monitor: quay.io/cortexlabs/request-monitor:master -image_istio_proxy: quay.io/cortexlabs/istio-proxy:master -image_istio_pilot: quay.io/cortexlabs/istio-pilot:master -image_google_pause: quay.io/cortexlabs/google-pause:master -image_prometheus: quay.io/cortexlabs/prometheus:master -image_prometheus_config_reloader: quay.io/cortexlabs/prometheus-config-reloader:master -image_prometheus_operator: quay.io/cortexlabs/prometheus-operator:master -image_prometheus_statsd_exporter: quay.io/cortexlabs/prometheus-statsd-exporter:master -image_prometheus_dcgm_exporter: quay.io/cortexlabs/prometheus-dcgm-exporter:master -image_prometheus_kube_state_metrics: quay.io/cortexlabs/prometheus-kube-state-metrics:master -image_prometheus_node_exporter: quay.io/cortexlabs/prometheus-node-exporter:master -image_kube_rbac_proxy: quay.io/cortexlabs/kube-rbac-proxy:master -image_grafana: quay.io/cortexlabs/grafana:master -image_event_exporter: quay.io/cortexlabs/event-exporter:master -``` diff --git a/docs/clusters/gcp/kubectl.md b/docs/clusters/gcp/kubectl.md deleted file mode 100644 index 3537161709..0000000000 --- a/docs/clusters/gcp/kubectl.md +++ /dev/null @@ -1,21 +0,0 @@ -# Setting up kubectl - -## Install kubectl - -Follow these [instructions](https://kubernetes.io/docs/tasks/tools/install-kubectl). - -## Install gcloud - -Follow these [instructions](https://cloud.google.com/sdk/docs/install). - -## Update kubeconfig - -```bash -gcloud container clusters get-credentials --zone --project -``` - -## Test kubectl - -```bash -kubectl get pods -``` diff --git a/docs/clusters/gcp/multi-instance-type.md b/docs/clusters/gcp/multi-instance-type.md deleted file mode 100644 index ae36dc4301..0000000000 --- a/docs/clusters/gcp/multi-instance-type.md +++ /dev/null @@ -1,69 +0,0 @@ -# Multi-instance type clusters - -The cluster can be configured to provision different instance types depending on what resources the APIs request. The multi instance type cluster has the following advantages over the single-instance type cluster: - -* **Lower costs**: Reduced overall compute costs by using the most economical instance for the given workloads. -* **Simpler logistics**: Managing multiple clusters on your own is no longer required. -* **Multi-purpose cluster**: The cluster can now take any range of workloads. One cluster for everything. Just throw a bunch of node pools in the cluster config, and you’re set. - -## Best practices - -When specifying the node pools in your `cluster.yaml` config, keep in mind that node pools with lower indexes have a higher priority over the other ones. With that mind, the best practices that result from this are: - -1. Node pools with smaller instances should have the higher priority. -1. Node pools with CPU-only instances should come before the node pools equipped with GPU instances. -1. The preemptible node pools should always come first over the ones that have on-demand instances. - -## Example node pools - -### CPU preemptible/on-demand with GPU on-demand - -```yaml -# cluster.yaml - -node_pools: - - name: cpu-preempt - instance_type: e2-standard-2 - preemptible: true - - name: cpu - instance_type: e2-standard-2 - - name: gpu - instance_type: e2-standard-2 - accelerator_type: nvidia-tesla-t4 -``` - -### CPU on-demand with 2 GPU on-demand - -```yaml -# cluster.yaml - -node_pools: - - name: cpu - instance_type: e2-standard-2 - - name: gpu-small - instance_type: e2-standard-2 - accelerator_type: nvidia-tesla-t4 - - name: gpu-large - instance_type: e2-standard-2 - accelerator_type: nvidia-tesla-t4 - accelerators_per_instance: 4 -``` - -### 3 preemptible CPU node pools with 1 on-demand CPU - -```yaml -# cluster.yaml - -node_pools: - - name: cpu-0 - instance_type: e2-standard-2 - preemptible: true - - name: cpu-1 - instance_type: e2-standard-4 - preemptible: true - - name: cpu-2 - instance_type: e2-standard-8 - preemptible: true - - name: cpu-3 - instance_type: e2-standard-32 -``` diff --git a/docs/clusters/gcp/uninstall.md b/docs/clusters/gcp/uninstall.md deleted file mode 100644 index 493f00469c..0000000000 --- a/docs/clusters/gcp/uninstall.md +++ /dev/null @@ -1,17 +0,0 @@ -# Uninstall - -Since you may wish to have access to your data after spinning down your cluster, Cortex's bucket, stackdriver logs, and -Prometheus volume are not automatically deleted when running `cortex cluster-gcp down --config cluster.yaml`. - -```bash -cortex cluster-gcp down --config cluster.yaml -``` - -The `cortex cluster-gcp down --config cluster.yaml` command doesn't wait for the cluster to spin down. You can ensure that the cluster has -spun down by checking the GKE console. - -## Keep Cortex Volumes - -The volumes used by Cortex's Prometheus and Grafana instances are deleted by default on a cluster down operation. -If you want to keep the metrics and dashboards volumes for any reason, -you can pass the `--keep-volumes` flag to the `cortex cluster-gcp down --config cluster.yaml` command. diff --git a/docs/clusters/instances/multi.md b/docs/clusters/instances/multi.md new file mode 100644 index 0000000000..01d015fe30 --- /dev/null +++ b/docs/clusters/instances/multi.md @@ -0,0 +1,61 @@ +# Multi-instance type clusters + +Cortex can be configured to provision different instance types to improve workload performance and reduce cloud infrastructure spend. + +## Best practices + +**Node groups with lower indices have higher priority.** + +1. Small instance node groups should be listed before large instance node groups. +1. CPU node groups should be listed before GPU/Inferentia node groups. +1. Spot node groups should always be listed before on-demand node groups. + +## Examples + +### CPU spot, CPU on-demand, and GPU on-demand + +```yaml +# cluster.yaml + +node_groups: + - name: cpu-spot + instance_type: m5.large + spot: true + - name: cpu-on-demand + instance_type: m5.large + - name: gpu-on-demand + instance_type: g4dn.xlarge +``` + +### CPU on-demand, GPU on-demand, and Inferentia on-demand + +```yaml +# cluster.yaml + +node_groups: + - name: cpu-on-demand + instance_type: m5.large + - name: gpu-on-demand + instance_type: g4dn.xlarge + - name: inferentia-on-demand + instance_type: inf.xlarge +``` + +### 3 CPU spot and 1 CPU on-demand + +```yaml +# cluster.yaml + +node_groups: + - name: cpu-1 + instance_type: t3.medium + spot: true + - name: cpu-2 + instance_type: m5.2xlarge + spot: true + - name: cpu-3 + instance_type: m5.8xlarge + spot: true + - name: cpu-4 + instance_type: m5.24xlarge +``` diff --git a/docs/clusters/aws/spot.md b/docs/clusters/instances/spot.md similarity index 99% rename from docs/clusters/aws/spot.md rename to docs/clusters/instances/spot.md index 151b0f2ce2..9c650ae0dc 100644 --- a/docs/clusters/aws/spot.md +++ b/docs/clusters/instances/spot.md @@ -4,7 +4,7 @@ # cluster.yaml node_groups: - - name: node-group-0 + - name: node-group-1 # whether to use spot instances for this node group (default: false) spot: false diff --git a/docs/clusters/aws/auth.md b/docs/clusters/management/auth.md similarity index 98% rename from docs/clusters/aws/auth.md rename to docs/clusters/management/auth.md index 5090928a69..fb0bf2cdfc 100644 --- a/docs/clusters/aws/auth.md +++ b/docs/clusters/management/auth.md @@ -1,6 +1,6 @@ # Auth -## Cortex Client +## Client Cortex client uses the default credential provider chain to get credentials. Credentials will be read in the following order of precedence: @@ -8,13 +8,13 @@ Cortex client uses the default credential provider chain to get credentials. Cre - the name of the profile specified by `AWS_PROFILE` environment variable - `default` profile from `~/.aws/credentials` -### API Management +### API management Cortex client relies on AWS IAM to authenticate requests (e.g. `cortex deploy`, `cortex get`) to a cluster on AWS. The client will include a get-caller-identity request that has been signed with the credentials from the default credential provider chain along with original request. The operator executes the presigned request to verify that credentials are valid and belong to the same account as the IAM entity of the cluster. AWS credentials required to authenticate cortex client requests to the operator don't require any permissions. However, managing the cluster using `cortex cluster *` commands do require permissions. -### Cluster Management +### Cluster management It is recommended that your AWS credentials have AdminstratorAccess before running `cortex cluster *` commands. diff --git a/docs/clusters/aws/install.md b/docs/clusters/management/create.md similarity index 81% rename from docs/clusters/aws/install.md rename to docs/clusters/management/create.md index d04d84a845..0306dd0855 100644 --- a/docs/clusters/aws/install.md +++ b/docs/clusters/management/create.md @@ -2,27 +2,25 @@ ## Prerequisites -1. [Docker](https://docs.docker.com/install) must be installed and running on your machine (to verify, check that running `docker ps` does not return an error) -1. Subscribe to the [EKS-optimized AMI with GPU Support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) (for GPU clusters) -1. An IAM user with `AdministratorAccess` and programmatic access (see [security](security.md) if you'd like to use less privileged credentials after spinning up your cluster) -1. You may need to [request a limit increase](https://console.aws.amazon.com/servicequotas/home?#!/services/ec2/quotas) for your desired instance type +1. Install and run [Docker](https://docs.docker.com/install) on your machine. +1. Subscribe to the [AMI with GPU support](https://aws.amazon.com/marketplace/pp/B07GRHFXGM) (for GPU clusters). +1. Create an IAM user with `AdministratorAccess` and programmatic access. +1. You may need to [request limit increases](https://console.aws.amazon.com/servicequotas/home?#!/services/ec2/quotas) for your desired instance types. -## Spin up Cortex on your AWS account +## Create a cluster on your AWS account ```bash # install the CLI pip install cortex -# spin up Cortex on your AWS account -cortex cluster up cluster.yaml # (see configuration options below) +# create a cluster +cortex cluster up cluster.yaml ``` -## Configure Cortex +## `cluster.yaml` ```yaml -# cluster.yaml - -# EKS cluster name +# cluster name cluster_name: cortex # AWS region @@ -97,7 +95,7 @@ iam_policy_arns: ["arn:aws:iam::aws:policy/AmazonS3FullAccess"] vpc_cidr: 192.168.0.0/16 ``` -The docker images used by the Cortex cluster can also be overridden, although this is not common. They can be configured by adding any of these keys to your cluster configuration file (default values are shown): +The docker images used by the cluster can also be overridden. They can be configured by adding any of these keys to your cluster configuration file (default values are shown): ```yaml diff --git a/docs/clusters/aws/uninstall.md b/docs/clusters/management/delete.md similarity index 97% rename from docs/clusters/aws/uninstall.md rename to docs/clusters/management/delete.md index ac0a51da7f..b9d6334ac0 100644 --- a/docs/clusters/aws/uninstall.md +++ b/docs/clusters/management/delete.md @@ -29,7 +29,7 @@ aws logs describe-log-groups --log-group-name-prefix= --query logG ## Delete Certificates If you've configured a custom domain for your APIs, you can remove the SSL Certificate and Hosted Zone for the domain by -following these [instructions](networking/custom-domain.md#cleanup). +following these [instructions](../networking/custom-domain.md#cleanup). ## Keep Cortex Volumes diff --git a/docs/clients/environments.md b/docs/clusters/management/environments.md similarity index 80% rename from docs/clients/environments.md rename to docs/clusters/management/environments.md index 2c8fcdbce7..437b821037 100644 --- a/docs/clients/environments.md +++ b/docs/clusters/management/environments.md @@ -1,6 +1,6 @@ # Environments -When you create a cluster with `cortex cluster up`, an environment named `aws` or `gcp` is automatically created to point to your cluster and is configured to be the default environment. You can name the environment something else via the `--configure-env` flag, e.g. `cortex cluster up --configure-env prod`. You can also use the `--configure-env` flag with `cortex cluster info` and `cortex cluster configure` to create / update the specified environment. +When you create a cluster with `cortex cluster up`, an environment named `aws` is automatically created to point to your cluster and is configured to be the default environment. You can name the environment something else via the `--configure-env` flag, e.g. `cortex cluster up --configure-env prod`. You can also use the `--configure-env` flag with `cortex cluster info` and `cortex cluster configure` to create / update the specified environment. You can list your environments with `cortex env list`, change the default environment with `cortex env default`, delete an environment with `cortex env delete`, and create/update an environment with `cortex env configure`. diff --git a/docs/clusters/aws/update.md b/docs/clusters/management/update.md similarity index 90% rename from docs/clusters/aws/update.md rename to docs/clusters/management/update.md index da218c89ce..93566682e5 100644 --- a/docs/clusters/aws/update.md +++ b/docs/clusters/management/update.md @@ -1,16 +1,16 @@ # Update -## Update Cortex configuration +## Update configuration ```bash cortex cluster configure cluster.yaml ``` -## Upgrade to a newer version of Cortex +## Upgrade to a newer version ```bash # spin down your cluster -cortex cluster down --config cluster.yaml # or just pass in the name and region of the cluster +cortex cluster down --name --region # update your CLI to the latest version pip install --upgrade cortex diff --git a/docs/clusters/aws/networking/custom-domain.md b/docs/clusters/networking/custom-domain.md similarity index 100% rename from docs/clusters/aws/networking/custom-domain.md rename to docs/clusters/networking/custom-domain.md diff --git a/docs/clusters/aws/networking/https.md b/docs/clusters/networking/https.md similarity index 99% rename from docs/clusters/aws/networking/https.md rename to docs/clusters/networking/https.md index 54cdf819a2..77df0ac276 100644 --- a/docs/clusters/aws/networking/https.md +++ b/docs/clusters/networking/https.md @@ -1,4 +1,4 @@ -# HTTPS (via API Gateway) +# HTTPS If you would like to support HTTPS endpoints for your Cortex APIs, here are a few options: diff --git a/docs/clusters/aws/networking/index.md b/docs/clusters/networking/load-balancers.md similarity index 98% rename from docs/clusters/aws/networking/index.md rename to docs/clusters/networking/load-balancers.md index 1388c5cc18..c67da5e2fd 100644 --- a/docs/clusters/aws/networking/index.md +++ b/docs/clusters/networking/load-balancers.md @@ -1,4 +1,4 @@ -# Networking +# Load balancers ![api architecture diagram](https://user-images.githubusercontent.com/808475/103417256-dd6e9700-4b3e-11eb-901e-90425f1f8fd4.png) diff --git a/docs/clusters/aws/networking/vpc-peering.md b/docs/clusters/networking/vpc-peering.md similarity index 100% rename from docs/clusters/aws/networking/vpc-peering.md rename to docs/clusters/networking/vpc-peering.md diff --git a/docs/workloads/observability/logging.md b/docs/clusters/observability/logging.md similarity index 51% rename from docs/workloads/observability/logging.md rename to docs/clusters/observability/logging.md index 54139582c9..c9dfea72ff 100644 --- a/docs/workloads/observability/logging.md +++ b/docs/clusters/observability/logging.md @@ -1,16 +1,12 @@ # Logging -Cortex provides a logging solution, out-of-the-box, without the need to configure anything. By default, logs are -collected with FluentBit, on every API kind, and are exported to each cloud provider logging solution. It is also -possible to view the logs of a single API replica, while developing, through the `cortex logs` command. +By default, logs are collected with Fluent Bit and are exported to CloudWatch. It is also possible to view the logs of a single replica using the `cortex logs` command. -## Cortex logs command +## `cortex logs` -The cortex CLI tool provides a command to quickly check the logs for a single API replica while debugging. +The CLI includes a command to get the logs for a single API replica for debugging purposes: -To check the logs of an API run one of the following commands: - -```shell +```bash # RealtimeAPI cortex logs @@ -23,9 +19,7 @@ solution. ## Logs on AWS -For AWS clusters, logs will be pushed to [CloudWatch](https://console.aws.amazon.com/cloudwatch/home) using fluent-bit. -A log group with the same name as your cluster will be created to store your logs. API logs are tagged with labels to -help with log aggregation and filtering. +Logs will automatically be pushed to CloudWatch and a log group with the same name as your cluster will be created to store your logs. API logs are tagged with labels to help with log aggregation and filtering. Below are some sample CloudWatch Log Insight queries: @@ -71,35 +65,6 @@ fields @timestamp, message | limit 1000 ``` -## Logs on GCP - -Logs will be pushed to [StackDriver](https://console.cloud.google.com/logs/query) using fluent-bit. API logs are tagged -with labels to help with log aggregation and filtering. - -Below are some sample Stackdriver queries: - -**RealtimeAPI:** - -```text -resource.type="k8s_container" -resource.labels.cluster_name="" -labels.apiKind="RealtimeAPI" -labels.apiName="" -``` - -**TaskAPI:** - -```text -resource.type="k8s_container" -resource.labels.cluster_name="" -labels.apiKind="TaskAPI" -labels.apiName="" -labels.jobID="" -``` - -Please make sure to navigate to the project containing your cluster and adjust the time range accordingly before running -queries. - ## Structured logging You can use Cortex's logger in your Python code to log in JSON, which will enrich your logs with Cortex's metadata, and diff --git a/docs/workloads/observability/metrics.md b/docs/clusters/observability/metrics.md similarity index 85% rename from docs/workloads/observability/metrics.md rename to docs/clusters/observability/metrics.md index 6382ac393f..30c165796b 100644 --- a/docs/workloads/observability/metrics.md +++ b/docs/clusters/observability/metrics.md @@ -1,8 +1,6 @@ # Metrics -A cortex cluster includes a deployment of Prometheus for metrics collections and a deployment of Grafana for -visualization. You can monitor your APIs with the Grafana dashboards that ship with Cortex, or even add custom metrics -and dashboards. +Cortex includes Prometheus for metrics collection and Grafana for visualization. You can monitor your APIs with the default Grafana dashboards, or create custom metrics and dashboards. ## Accessing the dashboard @@ -11,19 +9,18 @@ The dashboard URL is displayed once you run a `cortex get ` command. Alternatively, you can access it on `http:///dashboard`. Run the following command to get the operator URL: -```shell +```bash cortex env list ``` If your operator load balancer is configured to be internal, there are a few options for accessing the dashboard: 1. Access the dashboard from a machine that has VPC Peering configured to your cluster's VPC, or which is inside of your - cluster's VPC + cluster's VPC. 1. Run `kubectl port-forward -n default grafana-0 3000:3000` to forward Grafana's port to your local machine, and access - the dashboard on [http://localhost:3000/](http://localhost:3000/) (see instructions for setting up `kubectl` - on [AWS](../../clusters/aws/kubectl.md) or [GCP](../../clusters/gcp/kubectl.md)) + the dashboard on [http://localhost:3000](http://localhost:3000) (see instructions for setting up `kubectl` [here](../advanced/kubectl.md)). 1. Set up VPN access to your cluster's - VPC ([AWS docs](https://docs.aws.amazon.com/vpc/latest/userguide/vpn-connections.html)) + VPC ([docs](https://docs.aws.amazon.com/vpc/latest/userguide/vpn-connections.html)). ### Default credentials @@ -35,7 +32,7 @@ The dashboard is protected with username / password authentication, which by def You will be prompted to change the admin user password in the first time you log in. Grafana allows managing the access of several users and managing teams. For more information on this topic check -the [grafana documentation](https://grafana.com/docs/grafana/latest/manage-users/). +the [grafana documentation](https://grafana.com/docs/grafana/latest/manage-users). ### Selecting an API @@ -80,10 +77,10 @@ you to create a custom metrics from your deployed API that can be later be used Code examples on how to use custom metrics for each API kind can be found here: -- [RealtimeAPI](../realtime/metrics.md#custom-user-metrics) -- [RealtimeAPI](../async/metrics.md#custom-user-metrics) -- [BatchAPI](../batch/metrics.md#custom-user-metrics) -- [TaskAPI](../task/metrics.md#custom-user-metrics) +- [RealtimeAPI](../../workloads/realtime/metrics.md#custom-user-metrics) +- [AsyncAPI](../../workloads/async/metrics.md#custom-user-metrics) +- [BatchAPI](../../workloads/batch/metrics.md#custom-user-metrics) +- [TaskAPI](../../workloads/task/metrics.md#custom-user-metrics) ### Metric types diff --git a/docs/start.md b/docs/start.md index f710673e53..caed3f77c4 100644 --- a/docs/start.md +++ b/docs/start.md @@ -1,20 +1,27 @@ # Get started -## Install the CLI +## Create a cluster on your AWS account ```bash +# install the CLI pip install cortex -``` -See [here](clients/install.md) for alternative installation options. +# create a cluster +cortex cluster up cluster.yaml +``` -## Create a cluster +* [Client installation](clients/install.md) - customize your client installation. +* [Cluster configuration](clusters/management/create.md) - optimize your cluster for your workloads. +* [Environments](clusters/management/environments.md) - manage multiple clusters. -* [Launch a Cortex cluster on your AWS account](clusters/aws/install.md) -* [Launch a Cortex cluster on your GCP account](clusters/gcp/install.md) +## Run machine learning workloads at scale -## Run machine learning workloads +```bash +# deploy machine learning APIs +cortex deploy apis.yaml +``` -* [Realtime API](workloads/realtime/example.md) -* [Batch API](workloads/batch/example.md) -* [Task API](workloads/task/example.md) +* [RealtimeAPI](workloads/realtime/example.md) - create APIs that respond to prediction requests in real-time. +* [AsyncAPI](workloads/async/example.md) - create APIs that respond to prediction requests asynchronously. +* [BatchAPI](workloads/batch/example.md) - create APIs that run distributed batch inference jobs. +* [TaskAPI](workloads/task/example.md) - create APIs that run training or fine-tuning jobs. diff --git a/docs/summary.md b/docs/summary.md index 97eb138a55..d8ec98350a 100644 --- a/docs/summary.md +++ b/docs/summary.md @@ -2,13 +2,28 @@ * [Get started](start.md) -## Clients +## Clusters -* [Install](clients/install.md) -* [CLI commands](clients/cli.md) -* [Python API](clients/python.md) -* [Environments](clients/environments.md) -* [Uninstall](clients/uninstall.md) +* Management + * [Auth](clusters/management/auth.md) + * [Create](clusters/management/create.md) + * [Update](clusters/management/update.md) + * [Delete](clusters/management/delete.md) + * [Environments](clusters/management/environments.md) +* Instances + * [Multi-instance](clusters/instances/multi.md) + * [Spot instances](clusters/instances/spot.md) +* Observability + * [Logging](clusters/observability/logging.md) + * [Metrics](clusters/observability/metrics.md) +* Networking + * [Load balancers](clusters/networking/load-balancers.md) + * [VPC peering](clusters/networking/vpc-peering.md) + * [HTTPS](clusters/networking/https.md) + * [Custom domain](clusters/networking/custom-domain.md) +* Advanced + * [Setting up kubectl](clusters/advanced/kubectl.md) + * [Private Docker registry](clusters/advanced/registry.md) ## Workloads @@ -56,29 +71,10 @@ * [Python packages](workloads/dependencies/python-packages.md) * [System packages](workloads/dependencies/system-packages.md) * [Custom images](workloads/dependencies/images.md) -* Observability - * [Logging](workloads/observability/logging.md) - * [Metrics](workloads/observability/metrics.md) -## Clusters +## Clients -* AWS - * [Install](clusters/aws/install.md) - * [Update](clusters/aws/update.md) - * [Auth](clusters/aws/auth.md) - * [Security](clusters/aws/security.md) - * [Multi-instance type](clusters/aws/multi-instance-type.md) - * [Spot instances](clusters/aws/spot.md) - * [Networking](clusters/aws/networking/index.md) - * [Custom domain](clusters/aws/networking/custom-domain.md) - * [HTTPS (via API Gateway)](clusters/aws/networking/https.md) - * [VPC peering](clusters/aws/networking/vpc-peering.md) - * [Setting up kubectl](clusters/aws/kubectl.md) - * [Uninstall](clusters/aws/uninstall.md) -* GCP - * [Install](clusters/gcp/install.md) - * [Credentials](clusters/gcp/credentials.md) - * [Multi-instance type](clusters/gcp/multi-instance-type.md) - * [Setting up kubectl](clusters/gcp/kubectl.md) - * [Uninstall](clusters/gcp/uninstall.md) -* [Private Docker registry](clusters/registry.md) +* [Install](clients/install.md) +* [Uninstall](clients/uninstall.md) +* [CLI commands](clients/cli.md) +* [Python client](clients/python.md) diff --git a/docs/workloads/async/example.md b/docs/workloads/async/example.md index 9229bd156a..4b42a0d7e1 100644 --- a/docs/workloads/async/example.md +++ b/docs/workloads/async/example.md @@ -7,7 +7,7 @@ Create APIs that process your workloads asynchronously. Create a folder for your API. In this case, we are deploying an iris-classifier AsyncAPI. This folder will have the following structure: -```shell +```text ./iris-classifier ├── cortex.yaml ├── predictor.py diff --git a/docs/workloads/async/metrics.md b/docs/workloads/async/metrics.md index 7f0ca7fd2a..69fa1e24a8 100644 --- a/docs/workloads/async/metrics.md +++ b/docs/workloads/async/metrics.md @@ -24,8 +24,5 @@ class PythonPredictor: self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"}) ``` -Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on -custom metrics. - **Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a metrics push there is no exception thrown. diff --git a/docs/workloads/batch/metrics.md b/docs/workloads/batch/metrics.md index 95d3817d42..bb09335945 100644 --- a/docs/workloads/batch/metrics.md +++ b/docs/workloads/batch/metrics.md @@ -25,8 +25,5 @@ class PythonPredictor: self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"}) ``` -Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on -custom metrics. - **Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a metrics push there is no exception thrown. diff --git a/docs/workloads/realtime/example.md b/docs/workloads/realtime/example.md index 888cbfb126..0816bb0c54 100644 --- a/docs/workloads/realtime/example.md +++ b/docs/workloads/realtime/example.md @@ -1,6 +1,6 @@ # RealtimeAPI -Create APIs that can respond to prediction requests in real-time. +Create APIs that respond to prediction requests in real-time. ## Implement diff --git a/docs/workloads/realtime/metrics.md b/docs/workloads/realtime/metrics.md index 9ef19306c6..c37bef259a 100644 --- a/docs/workloads/realtime/metrics.md +++ b/docs/workloads/realtime/metrics.md @@ -58,8 +58,5 @@ class PythonPredictor: self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"}) ``` -Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on -custom metrics. - **Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a metrics push there is no exception thrown. diff --git a/docs/workloads/task/metrics.md b/docs/workloads/task/metrics.md index 77c012e567..8a7a390872 100644 --- a/docs/workloads/task/metrics.md +++ b/docs/workloads/task/metrics.md @@ -24,8 +24,5 @@ class Task: self.metrics.histogram(metric="my_histogram", value=100, tags={"model": "v1"}) ``` -Refer to the [observability documentation](../observability/metrics.md#custom-user-metrics) for more information on -custom metrics. - **Note**: The metrics client uses the UDP protocol to push metrics, to be fault tolerant, so if it fails during a metrics push there is no exception thrown.