[link]
The authors propose to accelerate the stochastic gradient optimization algorithm by reducing the variance of the noisy gradient estimate by using the 'control variate' trick (a standard variance reduction technique for Monte Carlo simulations, explained in [3] for example). The control variate is a vector which hopefully has high correlation with the noisy gradient but for which the expectation is easier to compute. Standard convergence rates for stochastic gradient optimization depend on the variance of the gradient estimates, and thus a variance reduction technique should yield an acceleration of convergence. The authors give examples of control variates by using Taylor approximations of the gradient estimate for the optimization problem arising in regularized logistic regression as well as for MAP estimation for the latent Dirichlet Allocation (LDA) model. They compare constant stepsize SGD with and without variance reduction for logistic regression on the covtype dataset, claiming that the variance reduction allows to use bigger stepsizes without having the problem of high variance and thus yields faster empirical convergence. For LDA, they compare the adaptive stepsize version of the stochastic optimization method of [10] with and without variance reduction, showing a faster convergence on the heldout test loglikelihood on three large corpora.
Your comment:
