Faster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksFaster R-CNN: Towards Real-Time Object Detection with Region Proposal NetworksRen, Shaoqing and He, Kaiming and Girshick, Ross B. and Sun, Jian2015
Paper summaryleopaillier_Objective:_ Improve on Fast R-CNN and [SPPnet](https://arxiv.org/abs/1406.4729) by incorporating the region proposal network directly.
_Dataset:_ [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/) and [COCO](http://mscoco.org/).
Both Fast R-CNN and SPPnet takes as input an image and several possibles objects (corresponding to regions of interest) and score each of them. They are thus two different entities:
1. A region proposal network.
2. A classification/detection network (Fast R-CNN/SSPnet).
First image features are extracted using a state of the art ConvNet, then they are used for both Region proposal and actual detection/classification on those regions.
[![screen shot 2017-04-14 at 2 59 28 pm](https://cloud.githubusercontent.com/assets/17261080/25043807/01a287b6-2123-11e7-944c-01493371df29.png)](https://cloud.githubusercontent.com/assets/17261080/25043807/01a287b6-2123-11e7-944c-01493371df29.png)
By incorporating the region proposal network right after the feature ConvNet its computation cost becomes basically free which leads to an elegant solution (only one network) but more importantly greatly improve speed at test time.
**Object detection** is the task of drawing one bounding box around each instance of the type of object one wants to detect. Typically, image classification is done before object detection. With neural networks, the usual procedure for object detection is to train a classification network, replace the last layer with a regression layer which essentially predicts pixel-wise if the object is there or not. An bounding box inference algorithm is added at last to make a consistent prediction (see [Deep Neural Networks for Object Detection](http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf)).
The paper introduces RPNs (Region Proposal Networks). They are end-to-end trained to generate region proposals.They simoultaneously regress region bounds and bjectness scores at each location on a regular grid.
RPNs are one type of fully convolutional networks. They take an image of any size as input and output a set of rectangular object proposals, each with an objectness score.
## See also
* [Fast R-CNN](http://www.shortscience.org/paper?bibtexKey=conf/iccv/Girshick15#joecohen)
* [Faster R-CNN](http://www.shortscience.org/paper?bibtexKey=conf/nips/RenHGS15#martinthoma)
* [Mask R-CNN](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeGDG17)