On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
Paper summary The goal is to improve the training process for a spoken dialogue system, more specifically a telephone-based system providing restaurant information for the Cambridge (UK) area. They train a supervised system which tries to predict the success on the current dialogue – if the model is certain about the outcome, the predicted label is used for training the dialogue system; if the model is uncertain, the user is asked to provide a label. Essentially it reduces the amount of annotation that is required, by choosing which examples should be annotated through active learning. https://i.imgur.com/dWY1EdE.png The dialogue is mapped to a vector representation using a bidirectional LSTM trained like an autoencoder, and a Gaussian Process is used for modelling dialogue success.
aclweb.org
scholar.google.com
On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
Su, Pei-Hao and Gasic, Milica and Mrksic, Nikola and Rojas-Barahona, Lina Maria and Ultes, Stefan and Vandyke, David and Wen, Tsung-Hsien and Young, Steve J.
Association for Computational Linguistics - 2016 via Local Bibsonomy
Keywords: dblp


[link]
Summary by Marek Rei 1 year ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and