Asynchronous Methods for Deep Reinforcement Learning on ShortScience.org

arxiv.org
scholar.google.com

Asynchronous Methods for Deep Reinforcement Learning
Mnih, Volodymyr and Badia, Adrià Puigdomènech and Mirza, Mehdi and Graves, Alex and Lillicrap, Timothy P. and Harley, Tim and Silver, David and Kavukcuoglu, Koray
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 2

[link] Summary by fabianboth 7 years ago

The main contribution of [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1602.01783v1.pdf) by Mnih et al. is a ligthweight framework for reinforcement learning agents. 

They propose a training procedure which utilizes asynchronous gradient decent updates from multiple agents at once. Instead of training one single agent who interacts with its environment, multiple agents are interacting with their own version of the environment simultaneously.

After a certain amount of timesteps, accumulated gradient updates from an agent are applied to a global model, e.g. a Deep Q-Network. These updates are asynchronous and lock free. 

Effects of training speed and quality are analyzed for various reinforcement learning methods. No replay memory is need to decorrelate successive game states, since all agents are already exploring different game states in real time. Also, on-policy algorithms like actor-critic can be applied.

They show that asynchronous updates have a stabilizing effect on policy and value updates. Also, their best method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU.

Your comment: