[link]
Neelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration: $\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$ where the variance $\sigma^2$ is adapted throughout training as follows: $\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$ where $\eta$ and $\gamma$ are hyperparameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, especially given optimization. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Your comment:
