Language Modeling with Gated Convolutional Networks Language Modeling with Gated Convolutional Networks
Paper summary This paper is about a new model for language which uses a convolutional approach instead of LSTMs. ## General Language modeling Statistical language models estimate the probability distribution of a sequence of words. They are important for ASR (automatic speech recognition) and translation. The usual approach is to embedd words into $\mathbb{R}^n$ and then apply RNNs to the vector sequences. ## Evaluation * [WikiText-103](http://metamind.io/research/the-wikitext-long-term-dependency-language-modeling-dataset/): [Perplexity](https://en.wikipedia.org/wiki/Perplexity) of 44.9 (lower is better) * new best single-GPU result on the Google Billion Word benchmark: Perplexity of 43.9 ## Idea * uses Gated Linear Units (GLU) * uses pre-activation residual blocks * adaptive softmax * no tanh in the gating mechanism * use gradient clipping ## See also * [Reddit](https://www.reddit.com/r/MachineLearning/comments/5kbsjb/r_161208083_language_modeling_with_gated/) * [Improving Neural Language Models with a Continuous Cache](https://arxiv.org/abs/1612.04426): Test perplexity of **40.8 on WikiText-103**
arxiv.org
arxiv-sanity.com
scholar.google.com
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin and Angela Fan and Michael Auli and David Grangier
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL

more

Summary by Martin Thoma 1 year ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and