Adding Gradient Noise Improves Learning for Very Deep Networks Adding Gradient Noise Improves Learning for Very Deep Networks
Paper summary Neelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration: $\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$ where the variance $\sigma^2$ is adapted throughout training as follows: $\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$ where $\eta$ and $\gamma$ are hyper-parameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, especially given optimization. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
arxiv-sanity.com
scholar.google.com
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan and Luke Vilnis and Quoc V. Le and Ilya Sutskever and Lukasz Kaiser and Karol Kurach and James Martens
arXiv e-Print archive - 2015 via Local arXiv
Keywords: stat.ML, cs.LG

more

[link]
Summary by David Stutz 1 month ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and