Layer Normalization Layer Normalization
Paper summary Ba et al. propose layer normalization, normalizing the activations of a layer by its mean and standard deviation. In contrast to batch normalization, this scheme does not depend on the current batch; thus, it performs the same computation at training and test time. The general scheme, however, is very similar. Given the $l$-th layer of a multi-layer perceptron, $a_i^l = (w_i^l)^T h^l$ and $h_i^{l + 1} = f(a_i^l + b_i^l)$ with $W^l$ being the weight matrix, the activations $a_i^l$ are normalized by mean $\mu_i^l$ and standard deviation $\sigma_i^l$. For batch normalization these are estimated over the current mini batch: $\mu_i^l = \mathbb{E}_{p(x)} [a_i^l]$ and $\sigma_i^l = \sqrt{\mathbb{E}_{p(x)} [(a_i^l - \mu_i^l)^2}$. However, this estimation depends heavily on the batch size; additionally, models change during training and test time (at test time, these statistics are estimated over the training set). For layer normalization, instead, these statistics are evaluated over the activations in the same layer: $\mu^l = \frac{1}{H}\sum_{i = 1}^H a_i^l$ and $\sigma^l = \sqrt{\frac{1}{H}\sum_{i = 1}^H (a_i^l - \mu^l)^2}$. Thus, the normalization is not depending on the batch size anymore. Additionally, layer normalization is invariant to scaling and shifts of the weight matrix (for batch normalization, this only holds for the columns of the matrix). In experiments, this approach is shown to work well for a variety of tasks including models with attention mechanisms and recurrent neural networks. For convolutional neural networks, the authors state that layer normalization does not outperform batch normalization, but performs better than using no normalization at all. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
arxiv-sanity.com
scholar.google.com
Layer Normalization
Jimmy Lei Ba and Jamie Ryan Kiros and Geoffrey E. Hinton
arXiv e-Print archive - 2016 via Local arXiv
Keywords: stat.ML, cs.LG

more

[link]
Summary by David Stutz 8 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and