Adaptive Subgradient Methods for Online Learning and Stochastic OptimizationAdaptive Subgradient Methods for Online Learning and Stochastic OptimizationDuchi, John C. and Hazan, Elad and Singer, Yoram2010

Paper summaryjoecohenThis is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:
```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```

This is Adagrad. Adagrad is an adaptive learning rate method. Some sample code from [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) is:
```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```