A Deep and Tractable Density EstimatorA Deep and Tractable Density EstimatorUria, Benigno and Murray, Iain and Larochelle, Hugo2014

Paper summarycubs#### Problem addressed:
Fully visible Bayesian network learning
#### Summary:
This work is an extension of the original NADE paper. As oppose to using a prefixed random fully visible connected Bayesian network (FVBN), they try to train a factorial number of all possible FVBN by optimizing a stochastic version of the objective, which is an unbiased estimator. The resultant model is very easy to do any type of inference, in addition, since it is trained on all orderings, the ensemble generation of NADE models are also very easy with no additional cost. The training is to mask out the variables that one wants to predict, and maximize the likelihood over training data for the prediction of those missing variables. The model is very similar to denoising autoencoder with Bernoulli type of noise on the input. One drawback of this masking is that the model has no distinction between a masked out variable and a variable that has value 0. To overcome this, they supply the mask as additional input to the network and showed that this is an important ingredient for the model to work.
#### Novelty:
Proposed order agnoistic NADE, which overcome several drawbacks of original NADE.
#### Drawbacks:
The inference at test time is a bit expensive.
#### Datasets:
UCI, binary MNIST
#### Additional remarks:
#### Resources:
The first author provided the implementation on his website
#### Presenter:
Yingbo Zhou

#### Problem addressed:
Fully visible Bayesian network learning
#### Summary:
This work is an extension of the original NADE paper. As oppose to using a prefixed random fully visible connected Bayesian network (FVBN), they try to train a factorial number of all possible FVBN by optimizing a stochastic version of the objective, which is an unbiased estimator. The resultant model is very easy to do any type of inference, in addition, since it is trained on all orderings, the ensemble generation of NADE models are also very easy with no additional cost. The training is to mask out the variables that one wants to predict, and maximize the likelihood over training data for the prediction of those missing variables. The model is very similar to denoising autoencoder with Bernoulli type of noise on the input. One drawback of this masking is that the model has no distinction between a masked out variable and a variable that has value 0. To overcome this, they supply the mask as additional input to the network and showed that this is an important ingredient for the model to work.
#### Novelty:
Proposed order agnoistic NADE, which overcome several drawbacks of original NADE.
#### Drawbacks:
The inference at test time is a bit expensive.
#### Datasets:
UCI, binary MNIST
#### Additional remarks:
#### Resources:
The first author provided the implementation on his website
#### Presenter:
Yingbo Zhou