Paper summaryhenryzloThis paper is awesome in that it is full of content.
They replace W with its TSVD. When t, the reduced rank, is small, it saves computation time because you multiply smaller matrices twice rather than multiplying bigger matrices once.
In terms of units in hidden layers, they turn n->m into n->t->m
This only works for the forward pass though. If you were to train this, you would only learn a rank t matrix. In which case, there would be no reason to have the t->m layer. Unless you want more nonlinearities, but less rank; haven't seen that before.

Fast RCNN is a proposal detection net for object detection tasks.
##### Input & Output
The input to a Fast RCNN would be the input image and the region proposals (generated using Selective Search). There are 2 outputs of the net, probability map of all possible objects & background ( e.g. 21 classes for Pascal VOC'12) and corresponding bounding box parameters for each object classes.
##### Architecture
The Fast RCNN version of any deep net would need 3 major modifications. For e.g. for VGG'16
1. A ROI pooling layer needs to be added after the final maxpool output before fully connected layers
2. The final FC layer is replaced by 2 sibling branched layers - one for giving a softmax output for probability classes, other one is for predicting an encoding of 4 bounding box parameters (x,y, width,height) w.r.t. region proposals
3. Modifying the input 2 take 2 input. images and corresponding prposals
**ROI Pooling layer** - The most notable contribution from the paper is designed to maxpool the features inside a proposed region into a fixed size (for VGG'16 version of FCNN it was 7 x 7) . The intuition behind the layer is make it faster as compared to SPPNets, (which used spatial pyramidal pooling) and RCNN.
##### Results
The net is trained with dual loss (log loss on probability output + squared error loss on bounding box parameters) .
The results were very impressive, on the VOC '07, '10 & '12 datasets with Fast RCNN outperforming the rest of the nets, in terms of mAp accuracy

This method is based on improving the speed of R-CNN \cite{conf/cvpr/GirshickDDM14}
1. Where R-CNN would have two different objective functions, Fast R-CNN combines localization and classification losses into a "multi-task loss" in order to speed up training.
2. It also uses a pooling method based on \cite{journals/pami/HeZR015} called the RoI pooling layer that scales the input so the images don't have to be scaled before being set an an input image to the CNN. "RoI max pooling works by dividing the $h \times w$ RoI window into an $H \times W$ grid of sub-windows of approximate size $h/H \times w/W$ and then max-pooling the values in each sub-window into the corresponding output grid cell."
3. Backprop is performed for the RoI pooling layer by taking the argmax of the incoming gradients that overlap the incoming values.
This method is further improved by the paper "Faster R-CNN" \cite{conf/nips/RenHGS15}

This paper is awesome in that it is full of content.
They replace W with its TSVD. When t, the reduced rank, is small, it saves computation time because you multiply smaller matrices twice rather than multiplying bigger matrices once.
In terms of units in hidden layers, they turn n->m into n->t->m
This only works for the forward pass though. If you were to train this, you would only learn a rank t matrix. In which case, there would be no reason to have the t->m layer. Unless you want more nonlinearities, but less rank; haven't seen that before.