Neural Machine Translation by Jointly Learning to Align and Translate Neural Machine Translation by Jointly Learning to Align and Translate
Paper summary One core aspect of this attention approach is that it provides the ability to debug the learned representation by visualizing the softmax output (later called $\alpha_{ij}$) over the input words for each output word as shown below. https://i.imgur.com/Kb7bk3e.png In this approach each unit in the RNN they attend over the previous states, unitwise so the length can vary, and then apply a softmax and use the resulting probabilities to multiply and sum each state. This forms the memory used by each state to make a prediction. This bypasses the need for the network to encode everything in the state passed between units. Each hidden unit is computed as: $$s_i = f(s_{i−1}, y_{i−1}, c_i).$$ Where $s_{i−1}$ is the previous state and $y_{i−1}$ is the previous target word. Their contribution is $c_i$. This is the context vector which contains the memory of the input phrase. $$c_i = \sum_{j=1} \alpha_{ij} h_j$$ Here $\alpha_{ij}$ is the output of a softmax for the $j$th element of the input sequence. $h_j$ is the hidden state at the point the RNN was processing the input sequence.
arxiv.org
scholar.google.com
Neural Machine Translation by Jointly Learning to Align and Translate
Bahdanau, Dzmitry and Cho, Kyunghyun and Bengio, Yoshua
arXiv e-Print archive - 2014 via Bibsonomy
Keywords: dblp


Loading...
Your comment:
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About