Skip to content

Performance with cortex local server is 6X slower than when model is run from jupyter notebook #1774

@lminer

Description

@lminer

I'm finding that my tensorflow model is ~6X slower when run from the local server than when it is run from a jupyter notebook. I've checked nvtop when the model is running and it appears that the GPU is being used although only for a very brief portion of the overall time. I've also tried running the model in bentoml. In that case it's also slower, but only 3X. Speeds are comparably when I run from AWS, although I'm using a T4 in that case rather than the rtx 2080 ti that I use locally. Any suggestions on how I might diagnose the cause of the slowdown? Here are my config files:

- name: Foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    path: serving/cortex_server.py
    models:
      path: foo
      signature_key: serving_default
    image: quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs
    tensorflow_serving_image: quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0
  compute:
    gpu: 1
  autoscaling:
    min_replicas: 1
    max_replicas: 1
# cluster.yaml

# EKS cluster name
cluster_name: foo

# AWS region
region: us-east-1

# list of availability zones for your region
availability_zones: # default: 3 random availability zones in your region, e.g. [us-east-1a, us-east-1b, us-east-1c]

# instance type
instance_type: g4dn.xlarge

# minimum number of instances
min_instances: 1

# maximum number of instances
max_instances: 1

# disk storage size per instance (GB)
instance_volume_size: 50

# instance volume type [gp2 | io1 | st1 | sc1]
instance_volume_type: gp2

# instance volume iops (only applicable to io1)
# instance_volume_iops: 3000

# subnet visibility [public (instances will have public IPs) | private (instances will not have public IPs)]
subnet_visibility: private

# NAT gateway (required when using private subnets) [none | single | highly_available (a NAT gateway per availability zone)]
nat_gateway: single

# API load balancer scheme [internet-facing | internal]
api_load_balancer_scheme: internal

# operator load balancer scheme [internet-facing | internal]
# note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator
operator_load_balancer_scheme: internet-facing

# to install Cortex in an existing VPC, you can provide a list of subnets for your cluster to use
# subnet_visibility (specified above in this file) must match your subnets' visibility
# this is an advanced feature (not recommended for first-time users) and requires your VPC to be configured correctly; see https://eksctl.io/usage/vpc-networking/#use-existing-vpc-other-custom-configuration
# here is an example:
# subnets:
#   - availability_zone: us-west-2a
#     subnet_id: subnet-060f3961c876872ae
#   - availability_zone: us-west-2b
#     subnet_id: subnet-0faed05adf6042ab7

# additional tags to assign to AWS resources (all resources will automatically be tagged with cortex.dev/cluster-name: <cluster_name>)
tags: # <string>: <string> map of key/value pairs

# whether to use spot instances in the cluster (default: false)
spot: true

spot_config:
  # additional instance types with identical or better specs than the primary cluster instance type (defaults to only the primary instance type)
  instance_distribution: # [similar_instance_type_1, similar_instance_type_2]

  # minimum number of on demand instances (default: 0)
  on_demand_base_capacity: 0

  # percentage of on demand instances to use after the on demand base capacity has been met [0, 100] (default: 50)
  # note: setting this to 0 may hinder cluster scale up when spot instances are not available
  on_demand_percentage_above_base_capacity: 0

  # max price for spot instances (default: the on-demand price of the primary instance type)
  max_price: # <float>

  # number of spot instance pools across which to allocate spot instances [1, 20] (default: number of instances in instance distribution)
  instance_pools: 3

  # fallback to on-demand instances if spot instances were unable to be allocated (default: true)
  on_demand_backup: true

# SSL certificate ARN (only necessary when using a custom domain)
ssl_certificate_arn:

# primary CIDR block for the cluster's VPC
vpc_cidr: 192.168.0.0/16

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions