SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model sizeSqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model sizeIandola, Forrest N. and Moskewicz, Matthew W. and Ashraf, Khalid and Han, Song and Dally, William J. and Keutzer, Kurt2016

Paper summarymartinthomaThis paper is about the reduction of model parameters while maintaining (most) of the models accuracy.
The paper gives a nice overview over some key findings about CNNs. One part that is especially interesting is "2.4. Neural Network Design Space Exploration".
## Model compression
Key ideas for model compression are:
* singular value decomposition (SVD)
* replace parameters that are below a certain threshold with zeros to form a sparse matrix
* combining Network Pruning with quantization (to 8 bits or less)
* huffman encoding (Deep Compression)
Ideas used by this paper are
* Replacing 3x3 filters by 1x1 filters
* Decrease the number of input channels by using **squeeze layers**
One key idea to maintain high accuracy is to downsample late in the network. This means close to the input layer, the layer parameters have stride = 1, later they have stride > 1.
## Fire module
A Fire module is a squeeze convolution layer (which has only $n_1$ 1x1 filters), feeding into an expand layer that has a mix of $n_2$ 1x1 and $n_3$ 3x3 convolution filters. It is chosen
$$n_1 < n_2 + n_3$$
(Why?)
(to be continued)

$\bf Summary:$
The paper is about squeezing the number of parameters in a convolutional neural network. The number of parameters in a convolutional layer is given by (number of input channels)$\times$(number of filters)$\times$(size of filter$\times$size of filter).
The paper proposes 2 strategies: (i) replace 3x3 filters with 1x1 filters and (ii) decrease the number of input channels. They assume the budget of the filter is given, i,e., they do not tinker with the number of filters. Decrease in number of parameters will lead to less accuracy. To compensate, the authors propose to downsample late in the network.
The results are quite impressive. Compared to AlexNet, they achieve a 50x reduction is model size while preserving the accuracy. Their model can be further compressed with existing methods like Deep Compression which are orthogonal to this paper's approach and this can give in total of around 510x reduction while still preserving accuracy of AlexNet.
$\bf Question$: The impact on running times (specially on feed forward phase which may be more typical on embedded devices) is not clear to me. Is it certain to be reduced as well or at least be *no worse* than the baseline models?

This paper is about the reduction of model parameters while maintaining (most) of the models accuracy.
The paper gives a nice overview over some key findings about CNNs. One part that is especially interesting is "2.4. Neural Network Design Space Exploration".
## Model compression
Key ideas for model compression are:
* singular value decomposition (SVD)
* replace parameters that are below a certain threshold with zeros to form a sparse matrix
* combining Network Pruning with quantization (to 8 bits or less)
* huffman encoding (Deep Compression)
Ideas used by this paper are
* Replacing 3x3 filters by 1x1 filters
* Decrease the number of input channels by using **squeeze layers**
One key idea to maintain high accuracy is to downsample late in the network. This means close to the input layer, the layer parameters have stride = 1, later they have stride > 1.
## Fire module
A Fire module is a squeeze convolution layer (which has only $n_1$ 1x1 filters), feeding into an expand layer that has a mix of $n_2$ 1x1 and $n_3$ 3x3 convolution filters. It is chosen
$$n_1 < n_2 + n_3$$
(Why?)
(to be continued)