Recurrent Neural Network RegularizationRecurrent Neural Network RegularizationZaremba, Wojciech and Sutskever, Ilya and Vinyals, Oriol2014
Paper summaryshagunsodhani#### Introduction
* The paper explains how to apply dropout to LSTMs and how it could reduce overfitting in tasks like language modelling, speech recognition, image caption generation and machine translation.
* [Link to the paper](https://arxiv.org/abs/1409.2329)
* Regularisation method that drops out (or temporarily removes) units in a neural network.
the network, along with all its incoming and outgoing connections
* Conventional dropout does not work well with RNNs as the recurrence amplifies the noise and hurts learning.
* The paper proposes to apply dropout to only the non-recurrent connections.
* The dropout operator would corrupt information carried by some units (and not all) forcing them to perform intermediate computations more robustly.
* The information is corrupted L+1 times where L is the number of layers and is independent of timestamps traversed by the information.
* In the context of language modelling, image caption generation, speech recognition and machine translation, dropout enables training larger networks and reduces the testing error in terms of perplexity and frame accuracy.
TLDR; The authors show that applying dropout to only the **non-recurrent** connections (between layers of the same timestep) in an LSTM works well, improving the scores on various sequence tasks.
#### Data Sets and model performance
- PTB Language Modeling Perplexity: 78.4
- Google Icelandic Speech Dataset WER Accuracy: 70.5
- WMT'14 English to French Machine Translation BLEU: 29.03
- MS COCO Image Caption Generation BLEU: 24.3