Learning Online Alignments with Continuous Rewards Policy Gradient Learning Online Alignments with Continuous Rewards Policy Gradient
Paper summary TLDR; The authors use policy gradients on an RNN to train a "hard" attention mechanism that decides whether to output something at the current timestep or not. Their algorithm is online, which means it does not need to see the complete sequence before making a prediction, as is the case with soft attention. The authors evaluate their model on small- and medium-scale speech recognition tasks, where they achieve performance comparable to standard sequential models. #### Notes: - Entropy regularization and baselines were critical to make the model learn - Neat trick: Increase dropout as training progresses - Grid LSTMs outperformed standard LSTMs
arxiv.org
arxiv-sanity.com
scholar.google.com
Learning Online Alignments with Continuous Rewards Policy Gradient
Yuping Luo and Chung-Cheng Chiu and Navdeep Jaitly and Ilya Sutskever
arXiv e-Print archive - 2016 via arXiv
Keywords: cs.LG, cs.CL

more

Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About