SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model sizeSqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model sizeIandola, Forrest N. and Moskewicz, Matthew W. and Ashraf, Khalid and Han, Song and Dally, William J. and Keutzer, Kurt2016
Paper summarynizWhile preserving accuracy,
- Network architecture improvement decreases parameters 51X (240MB to 4.8MB).
- By using Deep Compression, parameters shrinks more 10X more (4.8MB to 0.47MB).
Even improves more accuracy for about 2% by using Simple Bypass (shortcut connection).
They show insightful architectural design strategies;
1. Less 3x3 filters to decrease size,
2. Decrease input channels also to decrease size,
3. Downsample late to have larger activation maps to lead to higher accuracy.
And great insights about CNN design space exploration by parametrize microarchitecture,
- Squeeze Ratio to find good balance between weight size and accuracy.
- 3x3 filter percentage to find enough number of it.
This paper is about the reduction of model parameters while maintaining (most) of the models accuracy.
The paper gives a nice overview over some key findings about CNNs. One part that is especially interesting is "2.4. Neural Network Design Space Exploration".
## Model compression
Key ideas for model compression are:
* singular value decomposition (SVD)
* replace parameters that are below a certain threshold with zeros to form a sparse matrix
* combining Network Pruning with quantization (to 8 bits or less)
* huffman encoding (Deep Compression)
Ideas used by this paper are
* Replacing 3x3 filters by 1x1 filters
* Decrease the number of input channels by using **squeeze layers**
One key idea to maintain high accuracy is to downsample late in the network. This means close to the input layer, the layer parameters have stride = 1, later they have stride > 1.
## Fire module
A Fire module is a squeeze convolution layer (which has only $n_1$ 1x1 filters), feeding into an expand layer that has a mix of $n_2$ 1x1 and $n_3$ 3x3 convolution filters. It is chosen
$$n_1 < n_2 + n_3$$
(Why?)
(to be continued)