Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep LearningDropout as a Bayesian Approximation: Representing Model Uncertainty in Deep LearningGal, Yarin and Ghahramani, Zoubin2015

Paper summaryhlarochelleThis paper presents an interpretation of dropout training as performing approximate Bayesian learning in a deep Gaussian process (DGP) model. This connection suggests a very simple way of obtaining, for networks trained with dropout, estimates of the model's output uncertainty. This estimate is based and computed from an ensemble of networks each obtained by sampling a new dropout mask.
#### My two cents
This is a really nice and thought provoking contribution to our understanding of dropout. Unfortunately, the paper in fact doesn't provide a lot of comparisons with either other ways of estimating the predictive uncertainty of deep networks, or to other approximate inference schemes in deep GPs (actually, see update below). The qualitative examples provided however do suggest that the uncertainty estimate isn't terrible.
Irrespective of the quality of the uncertainty estimate suggested here, I find the observation itself really valuable. Perhaps future research will then shed light on how useful that method is compared to other approaches, including Bayesian dark knowledge \cite{conf/nips/BalanRMW15}.
`Update: On September 27th`, the authors uploaded to arXiv a new version that now includes comparisons with 2 alternative Bayesian learning methods for deep networks, specifically the stochastic variational inference approach of Graves and probabilistic back-propagation of Hernandez-Lobato and Adams. Dropout actually does very well against these baselines and, across datasets, is almost always amongst the best performing method!

This paper presents an interpretation of dropout training as performing approximate Bayesian learning in a deep Gaussian process (DGP) model. This connection suggests a very simple way of obtaining, for networks trained with dropout, estimates of the model's output uncertainty. This estimate is based and computed from an ensemble of networks each obtained by sampling a new dropout mask.
#### My two cents
This is a really nice and thought provoking contribution to our understanding of dropout. Unfortunately, the paper in fact doesn't provide a lot of comparisons with either other ways of estimating the predictive uncertainty of deep networks, or to other approximate inference schemes in deep GPs (actually, see update below). The qualitative examples provided however do suggest that the uncertainty estimate isn't terrible.
Irrespective of the quality of the uncertainty estimate suggested here, I find the observation itself really valuable. Perhaps future research will then shed light on how useful that method is compared to other approaches, including Bayesian dark knowledge \cite{conf/nips/BalanRMW15}.
`Update: On September 27th`, the authors uploaded to arXiv a new version that now includes comparisons with 2 alternative Bayesian learning methods for deep networks, specifically the stochastic variational inference approach of Graves and probabilistic back-propagation of Hernandez-Lobato and Adams. Dropout actually does very well against these baselines and, across datasets, is almost always amongst the best performing method!