- Fast: simply a linear projection from image feature to semantic tag: learn linear projection **W**. Especially, in testing time, it is almost O(1) complexity compared with the nearest neighbor methods with at least O(N) complexity.
- Enrich incomplete tags: learn tag enrichment projection **B** that turns on likely co-occurring tags with existing ones.
Marginalized blank-out regularization
- Assume observed tags are corrupted, approximate the unknown corrupting distribution with piecewise uniform distribution.
- Stacking: multi-layer linear projection. Reconstruct tags that do not co-occur together but tend to appear within similar contexts.
- Rare tags and Non-Uniform Corruption: only optimize tags that have recall below some threshold in the validation set.
#### Problem addressed:
It has been empirically observed that deep representations lead to better mode mixing when sampled using MCMC. The authors present a set of hypotheses as to why this happens and confirm them empirically.
The paper claims that deep representations (specially from parametric models) disentangle the factors of variations in the raw feature space. This disentangling leads to better ""mode mixing"" during MCMC sampling. For eg., in faces, the factors of variation could be identity-pose-illumination. If the higher layer learns these features then changing the representation in this space starting from a ""valid"" point would lead to changes in each of these factors directly and hence will produce ""valid"" images, which in the original feature space would be far apart; thus better mode mixing. This hypothesis is explained using 2 additional ones: (a) the manifold structure of the ""valid"" data is flattened in the higher layer space, and (b) the fraction of total volume occupied by high probability (valid) points is larger in the higher layer space. While (a) should lead to better interpolation in higher layer space, (b) should lead to more valid points in a parzen window around any known sample. These are confirmed experimentally.
novel intuitions why deep representations are good for generative modeling.
no theoretical justification
MNIST, Toronto Face dataset (TFD)
#### Additional remarks:
used DBN and Deep CAE for experiments on the datasets
This papers proposes a technique for learning the structure of SPNs by recursively splitting the data set along instances and dimensions. The data set is split by dimensions if the set of variables can be partitioned into independent subsets. Otherwise, it is split by instances. This paper improves upon previous work of Dennis and Ventura (NIPS 2012) by splitting on both instances and variables in a manner that fits in elegantly with the simplified recursive definition of SPNs used in the paper. The algorithm guarantees a locally optimal structure, if an independence oracle is available. Results show that the model is comparable to other graphical model learning approaches in terms of log probs but massively outperforms them wins in terms of time.
The proposed algorithm is a novel application of using simple clustering and mutual independence finding methods to SPN structure learning. The paper is well written and explains SPNs and the structure learning algorithm clearly.
- The authors do a thorough evaluation on a large number of data sets.
- Significant gains in conditional log probs (though, as pointed out in the paper, this might be in part due to exact inference in SPN compared to lower-bounds in other models).
- Much faster inference at the cost of slightly worse log probs.
- The algorithm makes hard decisions to split the data recursively. This makes portions of the data set completely inaccessible to each other after each recursion step. This might make the model very sensitive to any sub-optimal splits made early on.
- Weak or higher-order relationships among variables that are not captured by the independence checker may be lost.
- The algorithm can only learn SPNs where product nodes have disjoint scopes.