Skip to content

nathanjjohnson7/RCNN

Repository files navigation

RCNN

Implementation of Efficient Graph-Based Image Segmentation, Selective Search and RCNN

Steps

  1. Region Proposal: Efficient Graph-Based Image Segmentation is used to divide the input image into smaller, arbitrarily shaped regions
  2. Obtain Bounding Boxes: Selective Search hierarchically groups the regions from the prevous step, into larger and larger bounding boxes by comparing the color, texture, size and fill (how well two regions "fill" the bounding box that encloses them). The bounding box from each intermediate merge is stored, with the final bounding box encapsulating the whole image.
  3. Predictions: The bounding boxes returned from Selective Search are reshaped to a size of (224, 224), and passed through Alexnet (a convolution neural net architecture), which predicts the class
  4. Post-processing / Non-Maximum Supression:
    • Discard all bounding boxes predicted as "background"
    • Overlapping boxes predicting the same object are filtered using confidence scores, keeping only the most confident box for each object.
    • The remaining bounding boxes are our final object detections.

Efficient Graph-Based Image Segmentation

  1. A graph is created with 224x224 nodes, one for each pixel in the input image, with an edge between each pixel and its 8 neighbours. This is an undirected graph (the edges Pixel 0 -> Pixel 1 and Pixel 1 -> Pixel 0 are identical, and therefore won't be included twice).
  2. The weight of each edge is initialized to the difference in intensity between the associated pixels
  3. The edges are sorted in ascending order by intensity (edges between the most similar pixels are at the front).
  4. A disjoint set object is initialized with 224x224 items
  5. We loop through the sorted edges, and get the corresponding sets of each vertex of the edge. If each vertex belongs to a different set, and the edge weight is below a computed threshold, we join the sets.
  6. After looping through all the edges, the remaining sets in the disjoint set object correspond to the proposed regions
image image image image

Selective Search

  1. Run Efficient Graph-Based Image Segmentation, and create a color histogram and texture histogram for each proposed region
  2. Get the similarity of all neighbouring sets. The similarity is obtained by comparing the color histograms, texture histograms, size and by computing the fill
  3. Sort the regions by similarity; most similar first
  4. Loop through sorted regions and hierarchically join the two most similar, storing the bounding box that encapsulates both of them at each iteration. In the end, only one region will be left, consisting of the whole image.
  5. Get rid of all bounding boxes smaller than a certain threshold. The remaining bounding boxes are returned.

This basic implementation of the RCNN algorithm does not use 21 class specific SVMs or Bounding Box Regression. Furthermore, the images shown below used a model that was only trained for 7000 epochs instead of 50000 epochs.

Run time is approximately 2.5 to 3 seconds.

alt text

alt text

alt text

alt text

About

Implementation of Efficient Graph-Based Image Segmentation, Selective Search and RCNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages