Generalization in Deep Networks: The Role of Distance from Initialization Generalization in Deep Networks: The Role of Distance from Initialization
Paper summary Nagarajan and Kolter show that neural networks are implicitly regularized by stochastic gradient descent to have small distance from their initialization. This implicit regularization may explain the good generalization performance of over-parameterized neural networks; specifically, more complex models usually generalize better, which contradicts the general trade-off between expressivity and generalization in machine learning. On MNIST, the authors show that the distance of the network’s parameters to the original initialization (as measured using the $L_2$ norm on the flattened parameters) reduces with increasing width, and increases with increasing sample size. Additionally, the distance increases significantly when fitting corrupted labels, which may indicate that memorization requires to travel a larger distance in parameter space. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
scholar.google.com
Generalization in Deep Networks: The Role of Distance from Initialization
Nagarajan, Vaishnavh and Kolter, J. Zico
arXiv e-Print archive - 2019 via Local Bibsonomy
Keywords: dblp


[link]
Summary by David Stutz 1 month ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and