Paper summaryleopaillierImprove on [R-CNN](https://arxiv.org/abs/1311.2524) and [SPPnet](https://arxiv.org/abs/1406.4729) with easier and faster training.
Region-based Convolutional Neural Network (R-CNN), basically takes as input and image and several possibles objects (corresponding to Region of Interest) and score each of them.
## Architecture:
The feature map is computed for the whole image and then for each region of interest a new fixed-length feature vector is computed using max-pooling. From it two predictions are made for classification and bounding-box offsets.
[![screen shot 2017-04-14 at 12 46 38 pm](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)
## Results:
By sharing computation for RoIs of the same image and allowing simple SGD training it really improves performance training although at testing it's still not as fast as YOLO9000.

This method is based on improving the speed of R-CNN \cite{conf/cvpr/GirshickDDM14}
1. Where R-CNN would have two different objective functions, Fast R-CNN combines localization and classification losses into a "multi-task loss" in order to speed up training.
2. It also uses a pooling method based on \cite{journals/pami/HeZR015} called the RoI pooling layer that scales the input so the images don't have to be scaled before being set an an input image to the CNN. "RoI max pooling works by dividing the $h \times w$ RoI window into an $H \times W$ grid of sub-windows of approximate size $h/H \times w/W$ and then max-pooling the values in each sub-window into the corresponding output grid cell."
3. Backprop is performed for the RoI pooling layer by taking the argmax of the incoming gradients that overlap the incoming values.
This method is further improved by the paper "Faster R-CNN" \cite{conf/nips/RenHGS15}