Multi-Prediction Deep Boltzmann MachinesMulti-Prediction Deep Boltzmann MachinesGoodfellow, Ian J. and Mirza, Mehdi and Courville, Aaron C. and Bengio, Yoshua2013
Paper summarynipsreviewsThe paper presents a method for learning layers of representation and for completing missing queries both in input and labels in single procedure unlike some other methods like deep boltzmann machines (DBM). It is a recurrent net following the same operations as DBM with the goal of predicting a subset of inputs from its complement. Parts of paper are badly written, especially model explanation and multi-inference section, nevertheless the paper should be published and I hope the authors will rewrite them.
Deep Boltzmann Machines (DBNs) are usually initialized by greedily training a stack of RBMs, and then fine-tuning the overall model using persistent contrastive divergence (PCD). To perform classification, one typically provides the mean-field features to a separate classifier (e.g. a MLP) which is trained discriminatively. Therefore the overall process is somewhat ad-hoc, consisting of L + 2 models (where L is the number of hidden layers) each with its own objective. This paper presents a holistic training procedure for DBNs which has a single training stage (where both input and output variables are predicted) producing models which can classify directly as well as efficiently performing other tasks such as imputing missing inputs. The main technical contribution is the mechanism by which training is performed; a way of training DBNs which uses the mean field equations for the DBN to induce recurrent nets that are trained to solve different inference tasks (essentially predicting different subsets of observed variables).
The paper presents a method for learning layers of representation and for completing missing queries both in input and labels in single procedure unlike some other methods like deep boltzmann machines (DBM). It is a recurrent net following the same operations as DBM with the goal of predicting a subset of inputs from its complement. Parts of paper are badly written, especially model explanation and multi-inference section, nevertheless the paper should be published and I hope the authors will rewrite them.
Deep Boltzmann Machines (DBNs) are usually initialized by greedily training a stack of RBMs, and then fine-tuning the overall model using persistent contrastive divergence (PCD). To perform classification, one typically provides the mean-field features to a separate classifier (e.g. a MLP) which is trained discriminatively. Therefore the overall process is somewhat ad-hoc, consisting of L + 2 models (where L is the number of hidden layers) each with its own objective. This paper presents a holistic training procedure for DBNs which has a single training stage (where both input and output variables are predicted) producing models which can classify directly as well as efficiently performing other tasks such as imputing missing inputs. The main technical contribution is the mechanism by which training is performed; a way of training DBNs which uses the mean field equations for the DBN to induce recurrent nets that are trained to solve different inference tasks (essentially predicting different subsets of observed variables).