This project provides a comprehensive implementation of key convolutional neural network (CNN) operations and the U-Net architecture, leveraging OpenCL for GPU acceleration. The project includes both CPU and GPU versions of convolution, activation, pooling, normalization, concatenation, upsampling, and sigmoid functions, along with utilities for benchmarking and result validation.
- Efficient convolution routines for both CPU and GPU
- ReLU activation and max pooling layers
- Batch normalization and tensor concatenation
- Upsampling and sigmoid activation functions
- Extraction and conversion of pretrained weights for custom inference
- Automated comparison of CPU, GPU, and PyTorch outputs for validation
- OpenCL SDK and drivers
- OpenCV
- cnpy (for .npy file support)
- Meson build system
- Python packages: numpy, pandas, torch, opencv-python, matplotlib
Update your package list and install Intel OpenCL drivers:
sudo apt update
sudo apt install intel-opencl-icd
Verify your OpenCL installation:
clinfo
You should see details about your OpenCL platforms and devices.
-
Install required libraries:
sudo apt-get install opencl-headers ocl-icd-opencl-dev sudo apt-get install libopencv-dev sudo apt-get install libboost-all-dev sudo apt-get install cmake meson pip install -r requirements.txt
-
Add cnpy for .npy file support:
git clone https://github.com/rogersce/cnpy.git cp -r cnpy src/
-
Download and extract pretrained weights:
Download the ZF_UNET_224 weights from ZF_UNET_224_Pretrained_Model and extract them:
python3 extractWeights.py zf_unet_224.h5
This will save all necessary kernels and parameters in the
pretrainedKernels
directory.
OpenCL/
├── pretrainedKernels
├── gpu
| ├── builddir
│ │ ├── npy
│ │ ├── results
│ ├── lib
│ │ ├── Core
│ │ ├── lib
│ │ ├── OpenCL
│ │ └── vx
│ ├── src
│ │ ├── cnpy
│ │ ├── conv2d.cl
│ │ ├── conv2d.cpp
│ ├── meson.build
│ ├── run.sh
│ ├── inputImages
-
Generate a sample image for inference:
python3 genImage.py
-
Build and run the project:
chmod +x run.sh ./run.sh <PATH_TO_IMG_NPY_FILE>
The results from both CPU and GPU executions will be saved in the
results
directory.
- gpu/src/conv2d.cpp: Main implementation for convolution, pooling, upsampling, and U-Net inference using OpenCL.
- gpu/src/conv2d.cl: OpenCL kernel code for GPU-accelerated operations.
- gpu/pytorch.py: Reference PyTorch implementation for all core operations; outputs are saved as
.npy
files for validation. - gpu/compare.py: Compares outputs from CPU, GPU, and PyTorch implementations, reporting differences for each operation.
- gpu/inputImages/genImage.py: Utility to create sample images and corresponding
.npy
files for testing. - extractWeights.py: Extracts and saves weights and parameters from a pretrained Keras U-Net model.
- tfModel.py: This script defines and builds the ZF_UNET_224 U-Net model architecture in Keras, supporting pretrained weights and exporting intermediate results for image segmentation tasks.