Adaptive Subgradient Methods for Online Learning and Stochastic Optimization on ShortScience.org

colt2010.haifa.il.ibm.com
sci-hub
scholar.google.com

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Duchi, John C. and Hazan, Elad and Singer, Yoram
Conference on Learning Theory - 2010 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Joseph Paul Cohen 8 years ago

This is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from  [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:

```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private