An overview of gradient descent optimization algorithmsAn overview of gradient descent optimization algorithmsSebastian Ruder2016

Paper summaryanonymousThis is originally from a web post with the adding content about Noisy SGD methods.
Demo experiment is interesting but I would like to do it by myself to see the result.

First published: 2016/09/15 (2 months ago) Abstract: Gradient descent optimization algorithms, while increasingly popular, are
often used as black-box optimizers, as practical explanations of their
strengths and weaknesses are hard to come by. This article aims to provide the
reader with intuitions with regard to the behaviour of different algorithms
that will allow her to put them to use. In the course of this overview, we look
at different variants of gradient descent, summarize challenges, introduce the
most common optimization algorithms, review architectures in a parallel and
distributed setting, and investigate additional strategies for optimizing
gradient descent.

This is originally from a web post with the adding content about Noisy SGD methods.
Demo experiment is interesting but I would like to do it by myself to see the result.

Your comment:

You must log in before you can post this comment!

You must log in before you can submit this summary! Your draft will not be saved!

Preview:

0

Short Science allows researchers to publish paper summaries that are voted on and ranked! About