Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe and Christian Szegedy
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG
more

Summaries/Notes 8

[link] Summary by Léo Paillier 6 years ago

Network training is very sensitive to learning rate and initialization factors. Each layer output distribution is different than its input distribution (called covariate shift) which implies that layers have to permanently adapt to new input distribution. In this paper the author introduce batch normalization, a new layer to reduce covariate shift.

_Dataset:_ [MNIST](http://yann.lecun.com/exdb/mnist/), [ImageNet](www.image-net.org/).

#### Inner workings:

Batch normalization fixes the means and variances of layer inputs for a training batch by computing the following normalization on each batch.
[![screen shot 2017-04-13 at 10 21 39 am](https://cloud.githubusercontent.com/assets/17261080/24996464/4027fbba-2033-11e7-966a-2db3c0f1389d.png)](https://cloud.githubusercontent.com/assets/17261080/24996464/4027fbba-2033-11e7-966a-2db3c0f1389d.png)
The parameters Gamma and Beta are then learned with a gradient descent.
During inference the statistics are computed using unbiased estimators of the whole dataset (and not just the batch).

#### Results:

Batch normalization provides several advantages:

1. Use of a higher learning rate without risk of divergence by stabilizing the gradient scale.
2. Regularizes the model.
3. Reduces the need for dropout.
4. Avoid the network to get stuck when using saturating nonlinearities.

#### What to do?

1. Add batch norm layer before activation layers.
2. Increase the learning rate.
3. Remove dropout.
4. Reduce L2 weight regularization.
5. Accelerate learning rate decay.
6. Reduce picture distorsion for data augmentation.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private