arxiv.org
scholar.google.com
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Jaques, Natasha and Ghandeharioun, Asma and Shen, Judy Hanwen and Ferguson, Craig and Lapedriza, Àgata and Jones, Noah and Gu, Shixiang and Picard, Rosalind W.
arXiv e-Print archive - 2019 via Local Bibsonomy
Keywords: dblp




ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: