Adaptive Subgradient Methods for Online Learning and Stochastic OptimizationAdaptive Subgradient Methods for Online Learning and Stochastic OptimizationDuchi, John C. and Hazan, Elad and Singer, Yoram2010
Paper summaryjoecohenThis is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:
```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
Duchi, John C.
and
Hazan, Elad
and
Singer, Yoram
Conference on Learning Theory - 2010 via Local Bibsonomy
Keywords:
dblp
This is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:
```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```