Paper summaryleopaillierImprove on [R-CNN](https://arxiv.org/abs/1311.2524) and [SPPnet](https://arxiv.org/abs/1406.4729) with easier and faster training.
Region-based Convolutional Neural Network (R-CNN), basically takes as input and image and several possibles objects (corresponding to Region of Interest) and score each of them.
The feature map is computed for the whole image and then for each region of interest a new fixed-length feature vector is computed using max-pooling. From it two predictions are made for classification and bounding-box offsets.
[![screen shot 2017-04-14 at 12 46 38 pm](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)
By sharing computation for RoIs of the same image and allowing simple SGD training it really improves performance training although at testing it's still not as fast as YOLO9000.
This paper is awesome in that it is full of content.
They replace W with its TSVD. When t, the reduced rank, is small, it saves computation time because you multiply smaller matrices twice rather than multiplying bigger matrices once.
In terms of units in hidden layers, they turn n->m into n->t->m
This only works for the forward pass though. If you were to train this, you would only learn a rank t matrix. In which case, there would be no reason to have the t->m layer. Unless you want more nonlinearities, but less rank; haven't seen that before.