[link]
### Summary Knowing when a model is qualified to make a prediction is critical to safe deployment of ML technology. Modelindependent / Unsupervised OutofDistribution (OoD) detection is appealing mostly because it doesn't require taskspecific labels to train. It is tempting to suggest a simple onetailed test in which lower likelihoods are OoD (assigned by a Likelihood Model), but the intuition that InDistribution (ID) inputs should have highest likelihoods _does not hold in higher dimension_. The authors propose to use the WatanabeAkaike Information Criterion (WAIC) to circumvent this problem and empirically show the robustness of the approach. ### Counterintuitive Properties of Likelihood Models: https://i.imgur.com/4vo0Ff5.png So a GLOW model with Gaussian prior maps SVHN closer to the origin than Cifar (but never actually generates SVHN because Gaussian samples are on the shell). This is bad news for OoD detection. ### Proposed Methodology: Use the WAIC criterion for OoD detection which gives an asymptotically correct estimate of the gap between the training set and test set expectations: https://i.imgur.com/vasSxuk.png Basically, the correction term subtracts the variance in likelihoods across independent samples from the posterior. This acts to robustify the estimate, ensuring that points that are sensitive to the particular choice of posterior are penalized. They use an ensemble of generative models as a proxy for posterior samples i.e. the ensembles acts as approximate posterior samples. Now, OoD can be detected with a Likelihood Model: https://i.imgur.com/M3CDKOA.png ### Discussion Interestingly, GLOW maps Cifar and other datasets INSIDE the gaussian shell (which is an annulus of radius $\sqrt{dim} = \sqrt{3072} \approx 55.4$ https://i.imgur.com/ERdgOaz.png This is in itself quite disturbing, as it suggests that better flowbased generative models (for sampling) can be obtained by encouraging the training distribution to overlap better with the typical set in latent space.
Your comment:
