Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe and Christian Szegedy
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG
more

Summaries/Notes 8

[link] Summary by Cubs Reading Group 6 years ago

#### Problem addressed:
Strategy for training deep neural networks

#### Summary:
The input distribution (to every layer) undergoes constant changes while training a deep network. The authors call this internal covariate shift in the input distribution. The authors claim this leads to slow learning of optimal model parameters. In order to overcome this, they introduce the idea of normalizing the input of every layer a part of the optimization strategy. Specifically, they reparameterize the input to each layer so that it is whitened and thus has non-changing distribution at every iteration.

They apply 2 approximation in their strategy:

1. this normalization is done for every mini-batch of training data,

2. the input dimensions are assumed to be uncorrelated.

Finally, the output of last layer is mean subtracted and variance normalized (these can be back-propagated while training). Additionally, the authors also introduce 2 learnable scalar parameters $(r,b)$ per dimension such that the final input to a layer is $y=rg(BN(x))+b$ where g is the activation function.

The advantage of BN apart from the intuition mentioned above is that it allows higher learning rate and network behavior remains unaffected by the scale of the parameters W and bias. The authors also empirically show that BN acts as a regularizer since optimization without dropout yields at par performance.

#### Novelty:
Previous work only focused on whitening in 1st layer input. This work extends this idea to all layers and suggests a practical approach for applying this idea to real world data.

#### Datasets:
Imagenet

#### Resources:
presentation video available on cedar server

#### Presenter:
Devansh Arpit

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private