Model-Based Reinforcement Learning via Meta-Policy Optimization Model-Based Reinforcement Learning via Meta-Policy Optimization
Paper summary In terms of model based RL, learning dynamics models is imperfect, which often leads to the learned policy overfitting to the learned dynamics model, doing well in the learned simulator but not in the real world. Key solution idea: No need to try to learn one accurate simulator. We can learn an ensemble of models that together will sufficiently represent the space. If we learn an ensemble of models (to be used as many learned simulators) we can denoise estimates of performance. In a meta-learning sense these simulations become the tasks. The real world is then just yet another task, to which the policy could adapt quickly. One experimental observation is that at the start of training there is a lot of variation between learned simulators, and then the simulations come together over training, which might also point to this approach providing improved exploration. This summary was written with the help of Pieter Abbeel.
proceedings.mlr.press
scholar.google.com
Model-Based Reinforcement Learning via Meta-Policy Optimization
Clavera, Ignasi and Rothfuss, Jonas and Schulman, John and Fujita, Yasuhiro and Asfour, Tamim and Abbeel, Pieter
Conference on Robot Learning - 2018 via Local Bibsonomy
Keywords: dblp


[link]
Summary by Joseph Paul Cohen 3 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and