Adam: A Method for Stochastic OptimizationAdam: A Method for Stochastic OptimizationKingma, Diederik P. and Ba, Jimmy2014

Paper summaryjoecohenAdam is like RMSProp with momentum. The (simplified) update [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) looks as follows:
```
m = beta1*m + (1-beta1)*dx
v = beta2*v + (1-beta2)*(dx**2)
x += - learning_rate * m / (np.sqrt(v) + eps)
```

Adam is like RMSProp with momentum. The (simplified) update [[Stanford CS231n]](https://cs231n.github.io/neural-networks-3/#ada) looks as follows:
```
m = beta1*m + (1-beta1)*dx
v = beta2*v + (1-beta2)*(dx**2)
x += - learning_rate * m / (np.sqrt(v) + eps)
```