Paper summary ### Read-Again Two options: * GRU: run a pass of regular GRU on the input text $x_1,\ldots,x_n$. Use its hidden states $h_1,\ldots,h_n$ to compute weights vector for every step $i$ : $\alpha_i = \tanh \left( W_e h_i + U_e h_n + V_e x_i\right)$ and then runs a second GRU pass on the same input text. In the second pass the weights $\alpha_i$, from the first pass, are multiplied with the internal $z_i$ GRU gatting (controlling if hidden state is directly copied) of the second pass. * LSTM: concatenate the hidden states from the first pass with the input text $\left[ x_i, h_i, h_n \right]$ and run a second pass on this new input. In case of multiple sentences the above passes are done per sentence. In addition the $h^s_n$ of each sentence $s$ is concatenated with the $h^{s'}_n$ of the other sentences or with $\tanh \left( \sum_s V_s h_s + v\right)$ ### Decoder with copy mechanism LSTM with hidden state $s_t$. Input is previously generated word $y_{t-1}$ and context computed with attention mechanism: $c_t = \sum_i^n \beta_{it} h_i$. Here $h_i$ are the hidden states of the 2nd pass of the encoder. The weights are $\beta_{it} = \text{softmax} \left( v_a^T \tanh \left( W_a s_{t-1} + U_a h_i\right) \right)$ The decoder vocabulary $Y$ used is small. If $y_{t-1}$ does not appear in $Y$ but does appear in the input at $x_i$ then its embedding is replaced with $p_t = \tanh \left( W_c h_i + b_c\right)$ and <UNK> otherwise. $p_t$ is also used to copy the input to the output (details not given) ### Experiments abstractive summarization [DUC2003 and DUC2004 competitions](http://www-nlpir.nist.gov/projects/duc/data.html).
Wenyuan Zeng and Wenjie Luo and Sanja Fidler and Raquel Urtasun
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL


Summary by Udibr 4 years ago
