Adding Gradient Noise Improves Learning for Very Deep Networks on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan and Luke Vilnis and Quoc V. Le and Ilya Sutskever and Lukasz Kaiser and Karol Kurach and James Martens
arXiv e-Print archive - 2015 via Local arXiv
Keywords: stat.ML, cs.LG
more

Summaries/Notes 1

[link] Summary by David Stutz 4 years ago

Neelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration:

$\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$

where the variance $\sigma^2$ is adapted throughout training as follows:

$\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$

where $\eta$ and $\gamma$ are hyper-parameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, especially given optimization.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment: