Regularizing by the Variance of the Activations' Sample-Variances Regularizing by the Variance of the Activations' Sample-Variances
Paper summary Littwin and Wolf propose a activation variance regularizer that is shown to have a similar, even better, effect than batch normalization. The proposed regularizer is based on an analysis of the variance of activation values; the idea is that the measured variance of these variances is low if the activation values come from a distribution with few modes. Thus, the intention of the regularizer is to encourage distributions of activations with only few modes. This is achieved using the regularizers $\mathbb{E}[(1 - \frac{\sigma_s^2}{\sigma^2})^2]$ where $\sigma_s^2$ is the measured variance of activation values and $\sigma^2$ is the true variance of activation values. The estimate $\sigma^2_s$ is mostly influenced by the mini-batch used for training. In practice, the regularizer is replaced by $(1 - \frac{\sigma_{s_1}^2}{\sigma_{s_2}^2 + \beta})^2$ which can be estimated on two different batches, $s_1$ and $s_2$, during training and $\beta$ is a parameter that can be learned and mainly handles the case where the variance is close to zero. In the paper, the authors provide some theoretical bounds and also make a connection to batch normalization and in which cases and why the regularizer might be a better alternative. These claims are supported by experiments on Cifar and Tiny ImageNet. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
papers.nips.cc
scholar.google.com
Regularizing by the Variance of the Activations' Sample-Variances
Littwin, Etai and Wolf, Lior
Neural Information Processing Systems Conference - 2018 via Local Bibsonomy
Keywords: dblp


[link]
Summary by David Stutz 3 weeks ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and