Neural Machine Translation with Recurrent Attention Modeling Neural Machine Translation with Recurrent Attention Modeling
Paper summary TLDR; The standard attention model does not take into account the "history" of attention activations, even though this should be a good predictor of what to attend to next. The authors augment a seq2seq network with a dynamic memory that, for each input, keep track of an attention matrix over time. The model is evaluated on English-German and Englih-Chinese NMT tasks and beats competing models. #### Notes - How expensive is this, and how much more difficult are these networks to train? - Sequentiallly attending to neighboring words makes sense for some language pairs, but for others it doesn't. This method seems rather restricted because it only takes into account a window of k time steps.
arxiv.org
arxiv-sanity.com
scholar.google.com
Neural Machine Translation with Recurrent Attention Modeling
Zichao Yang and Zhiting Hu and Yuntian Deng and Chris Dyer and Alex Smola
arXiv e-Print archive - 2016 via arXiv
Keywords: cs.NE, cs.CL

more

Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About