Model-Based Reinforcement Learning via Meta-Policy Optimization on ShortScience.org

proceedings.mlr.press
scholar.google.com

Model-Based Reinforcement Learning via Meta-Policy Optimization
Clavera, Ignasi and Rothfuss, Jonas and Schulman, John and Fujita, Yasuhiro and Asfour, Tamim and Abbeel, Pieter
Conference on Robot Learning - 2018 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Joseph Paul Cohen 4 years ago

In terms of model based RL, learning dynamics models is imperfect, which often leads to the learned policy overfitting to the learned dynamics model, doing well in the learned simulator but not in the real world.

Key solution idea: No need to try to learn one accurate simulator. We can learn an ensemble of models that together will sufficiently represent the space. If we learn an ensemble of models (to be used as many learned simulators) we can denoise estimates of performance. In a meta-learning sense these simulations become the tasks. The real world is then just yet another task, to which the policy could adapt quickly.  One experimental observation is that at the start of training there is a lot of variation between learned simulators, and then the simulations come together over training, which might also point to this approach providing improved exploration.

This summary was written with the help of Pieter Abbeel.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private