Addressing Function Approximation Error in Actor-Critic Methods Addressing Function Approximation Error in Actor-Critic Methods
Paper summary As in Q-learning, modern actor-critic methods suffer from value estimation errors due to high bias and variance. While there are many attempts to address this in Q-learning (such as Double DQN), not much was done in actor-critic methods. Authors of the paper propose three modifications to DDPG and empirically show that they help address both bias and variance issues: * 1.) Clipped Double Q-Learning: Add a second pair of critics $Q_{\theta}$ and $Q_{\theta_\text{target}}$ (so four critics total) and use them to upper-bound the value estimate target update: $y = r + \gamma \min\limits_{i=1,2} Q_{\theta_{target,i}}(s', \pi_{\phi_1}(s'))$ * 2.) Reduce number of policy and target networks updates, and magnitude of target networks updates: $\theta_{target} \leftarrow \tau\theta + (1-\tau)\theta_{target}$ * 3.) Inject (clipped) random noise to the target policy: $\hat{a} \leftarrow \pi_{\phi_{target}}(s) + \text{clip}(N(0,\sigma), -c, c)$ Implementing these results, authors show significant improvements on seven continuous control tasks, beating not only reference DDPG algorithm, but also PPO, TRPO and ACKTR. Full algorithm from the paper: https://i.imgur.com/rRjwDyT.png Source code: https://github.com/sfujim/TD3
arxiv.org
scholar.google.com
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto and Herke van Hoof and Dave Meger
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.AI, cs.LG, stat.ML

more

Summary by Roman Ring 4 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and