Inverse Reward Design Inverse Reward Design
Paper summary The method they use basically tells the robot to reason as follows: 1. The human gave me a reward function $\tilde{r}$, selected in order to get me to behave the way they wanted. 2. So I should favor reward functions which produce that kind of behavior. This amounts to doing RL (step 1) followed by IRL on the learned policy (step 2); see the final paragraph of section 4.
Inverse Reward Design
Dylan Hadfield-Menell and Smitha Milli and Pieter Abbeel and Stuart Russell and Anca Dragan
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.AI, cs.LG


Summary by CodyWild 2 years ago
Your comment:
Summary by capybaralet 2 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and