Early Stopping without a Validation Set Early Stopping without a Validation Set
Paper summary Summary from [reddit](https://www.reddit.com/r/MachineLearning/comments/623oq4/r_early_stopping_without_a_validation_set/dfjzwqq/): We want to minimize the expected risk (loss) but that's a mean over the real distribution of the data, which we don't know. We approximate that by using a finite dataset and try to minimize the empirical risk instead. The gradients for the empirical risk are an approximation to the gradients for the expected risk. The idea is that the real gradients contain just information whereas the approximated gradients contain information + noise. The noise results from using a finite dataset to approximate the real distribution of the data. By computing local statistics about the gradients, the authors are able to determine when the gradients have no information about the expected risk anymore and what's left is just noise. If we keep optimizing we're going to overfit.
Early Stopping without a Validation Set
Maren Mahsereci and Lukas Balles and Christoph Lassner and Philipp Hennig
arXiv e-Print archive - 2017 via arXiv
Keywords: cs.LG, stat.ML


Your comment:

ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!