Skip to content

kepricon/dl4j-benchmark

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Benchmarks

Repository to track Dl4j benchmarks in relation to well known architectures on CPUs and GPUs.

Purpose

These benchmarks are designed to show the comparison performance between CPUs and GPUs. We tested various sort of neural networks including CNNs, RNNs, MLP and also tested MultiLayerNetwork and ComputationGraph as well. for more details, you can refer to Benchmark Details below.

How to run the benchmarks

# build with cudnn8.0 (for cpus, -P native, for gpus -P cuda8)
$ mvn clean package -DskipTests -P cudnn8

# run VGG16 benchmark for 16x3x224x224 input
$ java -cp dl4j-core-benchmark/dl4j-core-benchmark.jar org.deeplearning4j.benchmarks.BenchmarkTinyImageNet --modelType VGG16 -w 224 -h 224 -c 3 -b 16

Benchmark Environment

Device DGX-1
Operating System GNU/Linux Ubuntu 14.04.4 LTS
CPU Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
CPU Cores 80
BLAS Vendor for CPU OPENBLAS
GPU (#) Tesla P100-SXM2-16GB (8)
BLAS Vendor for GPU CUBLAS, CUDA 8.0
CUDNN v5.1
DL4J Version 0.8.0

Benchmark Details

MLP

  • Input : 64x784
  • Total Params : 1796010
  • Total Layers : 3
CPU GPU Multi
Avg Feedforward (ms) 14.32 1.95
Avg Backprop (ms) 24.8 2.48
Avg Iteration (ms) 50.51 15.21
Avg Samples/sec 1429.2 11491.08 23389.24
Avg Batches/sec 22.33 179.55 366.79

LeNet

  • Input : 64x1x28x28
  • Total Params : 431080
  • Total Layers : 6
CPU GPU Multi
Avg Feedforward (ms) 44.21 3.29
Avg Backprop (ms) 94.32 6.21
Avg Iteration (ms) 170.35 16.72
Avg Samples/sec 420.51 10280.08 20531.8
Avg Batches/sec 6.57 160.63 321.14

LSTM

  • Input : 64x300x256
  • Total Params : 571650
  • Total Layers : 2
CPU GPU Multi
Avg Feedforward (ms) 825.28 233.66
Avg Backprop (ms) 2820.08 792.96
Avg Iteration (ms) 5905.69 1285
Avg Samples/sec 11.46 49.34 189.96
Avg Batches/sec 0.18 0.77 2.97

AlexNet

  • Input : 32x3x224x224
  • Total Params : 59100744
  • Total Layers : 13
CPU GPU Multi
Avg Feedforward (ms) 812.67 307.46
Avg Backprop (ms) 2083.62 1105.46
Avg Iteration (ms) 3710.5 2335.57
Avg Samples/sec 8.59 13.52 52.39
Avg Batches/sec 0.27 0.42 1.64

INCEPTIONRESNETV1

  • Input : 32x3x160x160
  • Total Params : 16003768
  • Total Layers : 301
CPU GPU Multi
Avg Feedforward (ms) 2806.36 62.98
Avg Backprop (ms) 10426.49 205.49
Avg Iteration (ms) 17373.15 582.21
Avg Samples/sec 1.85 56.07 148.72
Avg Batches/sec 0.06 1.75 4.63

VGG16

  • Input : 32x3x224x224
  • Total Params : 135079944
  • Total Layers : 21
CPU GPU Multi
Avg Feedforward (ms) 14452.92 1245.66
Avg Backprop (ms) 40445.12 2834.26
Avg Iteration (ms) 52013.42 6299.24
Avg Samples/sec 0.52 5.03 13.1
Avg Batches/sec 0.02 0.16 0.4

APPENDIX

MLP

  • Implementation Code : here
  • Network Summary : here

LeNet

  • Implementation Code : here
  • Network Summary : here

LSTM

  • Implementation Code : here
  • Network Summary : here

AlexNet

VGG16

INCEPTIONRESNETV1

About

Repo to track dl4j benchmark code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 96.4%
  • Shell 2.2%
  • Lua 1.1%
  • Batchfile 0.3%