Language Modeling with Gated Convolutional Networks Language Modeling with Gated Convolutional Networks
Paper summary This paper is about a new model for language which uses a convolutional approach instead of LSTMs. ## General Language modeling Statistical language models estimate the probability distribution of a sequence of words. They are important for ASR (automatic speech recognition) and translation. The usual approach is to embedd words into $\mathbb{R}^n$ and then apply RNNs to the vector sequences. ## Evaluation * [WikiText-103]( [Perplexity]( of 44.9 (lower is better) * new best single-GPU result on the Google Billion Word benchmark: Perplexity of 43.9 ## Idea * uses Gated Linear Units (GLU) * uses pre-activation residual blocks * adaptive softmax * no tanh in the gating mechanism * use gradient clipping ## See also * [Reddit]( * [Improving Neural Language Models with a Continuous Cache]( Test perplexity of **40.8 on WikiText-103**
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin and Angela Fan and Michael Auli and David Grangier
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL


Summary by Martin Thoma 4 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and