Efficient Neural Network Deployment on Heterogenous TinyML Platforms
HTVM is a deep learning compiler for deploying neural networks on heterogeneous embedded compute platforms with multiple scratchpad-managed accelerators. HTVM generates self-contained C code that runs and dispatches neural network layers to either the platform's CPU, or one of it's accelerators.
To do this, HTVM mainly relies on:
- Apache TVM to generate CPU kernels, and run different layers sequentially.
- DORY to generate accelerator kernels with optimized scratchpad memory management.
Main requirements:
- TVM (contained in this repository) and tools to compile TVM.
- DORY version
8a0fe7bcadb207c6d80820a4bd2c2f2c0e823248 - Python 3.8
For DIANA, HTVM also requires:
- The adapted PULP-SDK for DIANA
- DORY Backend kernels for DIANA
- The PULP RISC-V GNU Toolchain
For your convenience, we advise to use our docker container with all dependencies installed, needed for building TVM.
We use podman commands here, but note that you can use docker as well if preferred.
Our github CI has an up-to-date image available that you can pull with:
podman pull ghcr.io/kuleuven-micas/htvm:mainOr you could build the container image locally with:
git clone --recursive https://github.com/KULeuven-MICAS/htvm
cd htvm
podman build . -f diana/docker/Dockerfile.tvm -t htvm:mainNote
See the Dockerfile in case you want to attempt installation without a container.
If you haven't already cloned the repo, do:
git clone --recursive https://github.com/KULeuven-MICAS/htvm
cd htvmNow create and start a container:
podman run -itv=`pwd`:/tvm-fork:z htvm:mainInside the container shell run:
mkdir build
cp diana/config.cmake build
cd build
cmake ..
make -j$(nproc)
cd ..Test if it works (also run from inside the container):
cd diana/byoc
python3 driver.py -hA number of ONNX example models, quantized by diana-quantlib, are provided in this repo through git LFS. For quantizing your own models, see diana-quantlib.
Download the model data with:
git lfs pullCompile a model for DIANA with digital acceleration:
python3 driver.py --no-run --onnx test_data/export_resnet8/ResNet_QL_NOANNOTATION.onnxOutput C-code and pulp binaries can be found at /tmp/digital_pulp_dory_fused_O3_None/pulp/.
Compiling a model for running on the CPU of your local machine:
python3 driver.py --no-run --device x86 --target c --onnx test_data/export_resnet8/ResNet_QL_NOANNOTATION.onnxOutput C-code and x86 binaries can be found at /tmp/digital_x86_c_fused_O3_None/x86.
Run it locally with:
/tmp/digital_x86_c_fused_O3_None/x86/demoIn addition to the standard test suite, provided by TVM, HTVM contains its own additional unit tests and end-to-end test.
The unit tests can be run with:
cd /path/to/htvm
pytest -v tests/python/contrib/test_soma_doryThe end-to-end tests rely on example ONNX files that are tracked with git lfs. Run git lfs pull in case you haven't done that already.
Now run:
cd diana/byoc
pytest -v test.pyHTVM currently supports deploying a number of tested neural networks on the Diana heterogeneous SoC.
The front-end supports ingesting quantized neural networks in ONNX format from Quantlib.
HTVM is Apache 2.0 Licensed.
This repository started off as a fork of the Apache TVM project on commit 2af3ab1e36e0e78bac8448a0357abee317fabb1f but was rebased on upstream several times.