Squeeze-and-Excitation NetworksSqueeze-and-Excitation NetworksJie Hu and Li Shen and Gang Sun2017
Paper summaryjoecohen"The SE module can learn some nonlinear global interactions already known to be useful, such as spatial normalization. The channel wise weights make it somewhat more powerful than divisive normalization as it can learn feature-specific inhibitions (ie: if we see a lot of flower parts, the probability of boat features should be diminished). It also has some similarity to bio inhibitory circuits." By jcannell on reddit
Summary by the author Jie Hu:
Our motivation is to explicitly model the interdependence between feature channels. In addition, we do not intend to introduce a new spatial dimension for the integration of feature channels, but rather a new "feature re-calibration" strategy. Specifically, it is through learning the way to automatically obtain the importance of each feature channel, and then in accordance with this importance to enhance the useful features and inhibit the current task is not useful features.
The above figure is a schematic diagram of our proposed SE module. Given an input $x$, the number of characteristic channels is $c_1$, and the characteristic number of a characteristic channel is $c_2$ by a series of convolution and other general transformations. Unlike traditional CNNs, we then re-calibrate the features we received in the next three operations.
The first is the Squeeze operation, we carry out the feature compression along the spatial dimension, and turn each two-dimensional feature channel into a real number. The real number has a global sense of the wild, and the output dimension and the number of input channels Match. It characterizes the global distribution of responses on the feature channel, and makes it possible to obtain a global sense of the field near the input, which is very useful in many tasks.
Followed by the Excitation operation, which is a mechanism similar to the door in a circular neural network. The weight is generated for each feature channel by the parameter $w$, where the parameter w is learned to explicitly model the correlation between the feature channels.
Reddit thread: https://www.reddit.com/r/MachineLearning/comments/6pt99z/r_squeezeandexcitation_networks_ilsvrc_2017/
First published: 2017/09/05 (2 years ago) Abstract: Convolutional neural networks are built upon the convolution operation, which
extracts informative features by fusing spatial and channel-wise information
together within local receptive fields. In order to boost the representational
power of a network, much existing work has shown the benefits of enhancing
spatial encoding. In this work, we focus on channels and propose a novel
architectural unit, which we term the "Squeeze-and-Excitation" (SE) block, that
adaptively recalibrates channel-wise feature responses by explicitly modelling
interdependencies between channels. We demonstrate that by stacking these
blocks together, we can construct SENet architectures that generalise extremely
well across challenging datasets. Crucially, we find that SE blocks produce
significant performance improvements for existing state-of-the-art deep
architectures at slight computational cost. SENets formed the foundation of our
ILSVRC 2017 classification submission which won first place and significantly
reduced the top-5 error to 2.251%, achieving a 25% relative improvement over
the winning entry of 2016.