Learning Latent Dynamics for Planning from PixelsLearning Latent Dynamics for Planning from PixelsDanijar Hafner and Timothy Lillicrap and Ian Fischer and Ruben Villegas and David Ha and Honglak Lee and James Davidson2018
Paper summarywassname**Summary**: This paper presents three tricks that make model-based reinforcement more reliable when tested in tasks that require walking and balancing. The tricks are 1) are planning based on features, 2) using a recursive network that mixes probabilistic and deterministic information, and 3) looking forward multiple steps.
Imagine playing pool, armed with a tablet that can predict exactly where the ball will bounce, and the next bounce, and so on. That would be a huge advantage to someone learning pool, however small inaccuracies in the model could mislead you especially when thinking ahead to the 2nd and third bounce.
The tablet is analogous to the dynamics model in model-based reinforcement learning (RL). Model based RL promises to solve a lot of the open problems with RL, letting the agent learn with less experience, transfer well, dream, and many others advantages. Despite the promise, dynamics models are hard to get working: they often suffer from even small inaccuracies, and need to be redesigned for specific tasks.
Enter PlaNet, a clever name, and a net that plans well in range of environments. To increase the challenge the model must predict directly from pixels in fairly difficult tasks such as teaching a cheetah to run or balancing a ball in a cup.
How do they do this? Three main tricks.
- Planning in latest space: this means that the policy network doesn't need to look at the raw image, but looks at a summary of it as represented by a feature vector.
- Recurrent state space models: They found that probabilistic information helps describe the space of possibilities but makes it harder for their RNN based model to look back multiple steps. However mixing probabilistic information and deterministic information gives it the best of both worlds, and they have results that show a starting performance increase when both compared to just one.
- Latent overshooting: They train the model to look more than one step ahead, this helps prevent errors that build up over time
Overall this paper shows great results that tackle the shortfalls of model based RL. I hope the results remain when tested on different and more complex environments.
Learning Latent Dynamics for Planning from Pixels
arXiv e-Print archive - 2018 via Local arXiv
cs.LG, cs.AI, stat.ML
First published: 2018/11/12 (7 months ago) Abstract: Planning has been very successful for control tasks with known environment
dynamics. To leverage planning in unknown environments, the agent needs to
learn the dynamics from interactions with the world. However, learning dynamics
models that are accurate enough for planning has been a long-standing
challenge, especially in image-based domains. We propose the Deep Planning
Network (PlaNet), a purely model-based agent that learns the environment
dynamics from pixels and chooses actions through online planning in latent
space. To achieve high performance, the dynamics model must accurately predict
the rewards ahead for multiple time steps. We approach this problem using a
latent dynamics model with both deterministic and stochastic transition
function and a generalized variational inference objective that we name latent
overshooting. Using only pixel observations, our agent solves continuous
control tasks with contact dynamics, partial observability, and sparse rewards.
PlaNet uses significantly fewer episodes and reaches final performance close to
and sometimes higher than top model-free algorithms.