Layer Normalization Layer Normalization
Paper summary Ba et al. propose layer normalization, normalizing the activations of a layer by its mean and standard deviation. In contrast to batch normalization, this scheme does not depend on the current batch; thus, it performs the same computation at training and test time. The general scheme, however, is very similar. Given the $l$-th layer of a multi-layer perceptron, $a_i^l = (w_i^l)^T h^l$ and $h_i^{l + 1} = f(a_i^l + b_i^l)$ with $W^l$ being the weight matrix, the activations $a_i^l$ are normalized by mean $\mu_i^l$ and standard deviation $\sigma_i^l$. For batch normalization these are estimated over the current mini batch: $\mu_i^l = \mathbb{E}_{p(x)} [a_i^l]$ and $\sigma_i^l = \sqrt{\mathbb{E}_{p(x)} [(a_i^l - \mu_i^l)^2}$. However, this estimation depends heavily on the batch size; additionally, models change during training and test time (at test time, these statistics are estimated over the training set). For layer normalization, instead, these statistics are evaluated over the activations in the same layer: $\mu^l = \frac{1}{H}\sum_{i = 1}^H a_i^l$ and $\sigma^l = \sqrt{\frac{1}{H}\sum_{i = 1}^H (a_i^l - \mu^l)^2}$. Thus, the normalization is not depending on the batch size anymore. Additionally, layer normalization is invariant to scaling and shifts of the weight matrix (for batch normalization, this only holds for the columns of the matrix). In experiments, this approach is shown to work well for a variety of tasks including models with attention mechanisms and recurrent neural networks. For convolutional neural networks, the authors state that layer normalization does not outperform batch normalization, but performs better than using no normalization at all. Also find this summary at [](
Layer Normalization
Jimmy Lei Ba and Jamie Ryan Kiros and Geoffrey E. Hinton
arXiv e-Print archive - 2016 via Local arXiv
Keywords: stat.ML, cs.LG


Summary by Denny Britz 4 years ago
Your comment:
Summary by David Stutz 1 year ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and