Performance with cortex local server is 6X slower than when model is run from jupyter notebook

I'm finding that my tensorflow model is ~6X slower when run from the local server than when it is run from a jupyter notebook. I've checked nvtop when the model is running and it appears that the GPU is being used although only for a very brief portion of the overall time. I've also tried running the model in bentoml. In that case it's also slower, but only 3X. Speeds are comparably when I run from AWS, although I'm using a T4 in that case rather than the rtx 2080 ti that I use locally. Any suggestions on how I might diagnose the cause of the slowdown? Here are my config files:

```yaml
- name: Foo
  kind: RealtimeAPI
  predictor:
    type: tensorflow
    path: serving/cortex_server.py
    models:
      path: foo
      signature_key: serving_default
    image: quay.io/robertlucian/tensorflow-predictor:0.25.0-tfs
    tensorflow_serving_image: quay.io/robertlucian/cortex-tensorflow-serving-gpu-tf2.4:0.25.0
  compute:
    gpu: 1
  autoscaling:
    min_replicas: 1
    max_replicas: 1
```

```
# cluster.yaml

# EKS cluster name
cluster_name: foo

# AWS region
region: us-east-1

# list of availability zones for your region
availability_zones: # default: 3 random availability zones in your region, e.g. [us-east-1a, us-east-1b, us-east-1c]

# instance type
instance_type: g4dn.xlarge

# minimum number of instances
min_instances: 1

# maximum number of instances
max_instances: 1

# disk storage size per instance (GB)
instance_volume_size: 50

# instance volume type [gp2 | io1 | st1 | sc1]
instance_volume_type: gp2

# instance volume iops (only applicable to io1)
# instance_volume_iops: 3000

# subnet visibility [public (instances will have public IPs) | private (instances will not have public IPs)]
subnet_visibility: private

# NAT gateway (required when using private subnets) [none | single | highly_available (a NAT gateway per availability zone)]
nat_gateway: single

# API load balancer scheme [internet-facing | internal]
api_load_balancer_scheme: internal

# operator load balancer scheme [internet-facing | internal]
# note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator
operator_load_balancer_scheme: internet-facing

# to install Cortex in an existing VPC, you can provide a list of subnets for your cluster to use
# subnet_visibility (specified above in this file) must match your subnets' visibility
# this is an advanced feature (not recommended for first-time users) and requires your VPC to be configured correctly; see https://eksctl.io/usage/vpc-networking/#use-existing-vpc-other-custom-configuration
# here is an example:
# subnets:
#   - availability_zone: us-west-2a
#     subnet_id: subnet-060f3961c876872ae
#   - availability_zone: us-west-2b
#     subnet_id: subnet-0faed05adf6042ab7

# additional tags to assign to AWS resources (all resources will automatically be tagged with cortex.dev/cluster-name: <cluster_name>)
tags: # <string>: <string> map of key/value pairs

# whether to use spot instances in the cluster (default: false)
spot: true

spot_config:
  # additional instance types with identical or better specs than the primary cluster instance type (defaults to only the primary instance type)
  instance_distribution: # [similar_instance_type_1, similar_instance_type_2]

  # minimum number of on demand instances (default: 0)
  on_demand_base_capacity: 0

  # percentage of on demand instances to use after the on demand base capacity has been met [0, 100] (default: 50)
  # note: setting this to 0 may hinder cluster scale up when spot instances are not available
  on_demand_percentage_above_base_capacity: 0

  # max price for spot instances (default: the on-demand price of the primary instance type)
  max_price: # <float>

  # number of spot instance pools across which to allocate spot instances [1, 20] (default: number of instances in instance distribution)
  instance_pools: 3

  # fallback to on-demand instances if spot instances were unable to be allocated (default: true)
  on_demand_backup: true

# SSL certificate ARN (only necessary when using a custom domain)
ssl_certificate_arn:

# primary CIDR block for the cluster's VPC
vpc_cidr: 192.168.0.0/16
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance with cortex local server is 6X slower than when model is run from jupyter notebook #1774

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance with cortex local server is 6X slower than when model is run from jupyter notebook #1774

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions