Skip to content

letscode011/OpenCL-GPU

Repository files navigation

OpenCL-Accelerated Conv2D and U-Net Operations

This project provides a comprehensive implementation of key convolutional neural network (CNN) operations and the U-Net architecture, leveraging OpenCL for GPU acceleration. The project includes both CPU and GPU versions of convolution, activation, pooling, normalization, concatenation, upsampling, and sigmoid functions, along with utilities for benchmarking and result validation.

Highlights

  • Efficient convolution routines for both CPU and GPU
  • ReLU activation and max pooling layers
  • Batch normalization and tensor concatenation
  • Upsampling and sigmoid activation functions
  • Extraction and conversion of pretrained weights for custom inference
  • Automated comparison of CPU, GPU, and PyTorch outputs for validation

Dependencies

  • OpenCL SDK and drivers
  • OpenCV
  • cnpy (for .npy file support)
  • Meson build system
  • Python packages: numpy, pandas, torch, opencv-python, matplotlib

Setting Up OpenCL on Ubuntu

Update your package list and install Intel OpenCL drivers:

sudo apt update
sudo apt install intel-opencl-icd

Verify your OpenCL installation:

clinfo

You should see details about your OpenCL platforms and devices.

Installation Steps

  1. Install required libraries:

    sudo apt-get install opencl-headers ocl-icd-opencl-dev
    sudo apt-get install libopencv-dev
    sudo apt-get install libboost-all-dev
    sudo apt-get install cmake meson
    pip install -r requirements.txt
  2. Add cnpy for .npy file support:

    git clone https://github.com/rogersce/cnpy.git
    cp -r cnpy src/
  3. Download and extract pretrained weights:

    Download the ZF_UNET_224 weights from ZF_UNET_224_Pretrained_Model and extract them:

    python3 extractWeights.py zf_unet_224.h5

    This will save all necessary kernels and parameters in the pretrainedKernels directory.

Project Directory Layout

OpenCL/
├── pretrainedKernels
├── gpu
|   ├── builddir
│   │   ├── npy
│   │   ├── results
│   ├── lib
│   │   ├── Core
│   │   ├── lib
│   │   ├── OpenCL
│   │   └── vx
│   ├── src
│   │   ├── cnpy
│   │   ├── conv2d.cl
│   │   ├── conv2d.cpp
│   ├── meson.build
│   ├── run.sh
│   ├── inputImages

Running the Pipeline

  1. Generate a sample image for inference:

    python3 genImage.py
  2. Build and run the project:

    chmod +x run.sh
    ./run.sh <PATH_TO_IMG_NPY_FILE>

    The results from both CPU and GPU executions will be saved in the results directory.

Source Overview

  • gpu/src/conv2d.cpp: Main implementation for convolution, pooling, upsampling, and U-Net inference using OpenCL.
  • gpu/src/conv2d.cl: OpenCL kernel code for GPU-accelerated operations.
  • gpu/pytorch.py: Reference PyTorch implementation for all core operations; outputs are saved as .npy files for validation.
  • gpu/compare.py: Compares outputs from CPU, GPU, and PyTorch implementations, reporting differences for each operation.
  • gpu/inputImages/genImage.py: Utility to create sample images and corresponding .npy files for testing.
  • extractWeights.py: Extracts and saves weights and parameters from a pretrained Keras U-Net model.
  • tfModel.py: This script defines and builds the ZF_UNET_224 U-Net model architecture in Keras, supporting pretrained weights and exporting intermediate results for image segmentation tasks.

About

UNET and Conv2d implementation on GPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published