WAIC, but Why? Generative Ensembles for Robust Anomaly Detection on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

WAIC, but Why? Generative Ensembles for Robust Anomaly Detection
Hyunsun Choi and Eric Jang and Alexander A. Alemi
arXiv e-Print archive - 2018 via Local arXiv
Keywords: stat.ML, cs.LG
more

Summaries/Notes 1

[link] Summary by Massimo Caccia 4 years ago

### Summary
Knowing when a model is qualified to make a prediction is critical to safe deployment of ML technology. Model-independent / Unsupervised Out-of-Distribution (OoD) detection is appealing mostly because it doesn't require task-specific labels to train. It is tempting to suggest a simple one-tailed test in which lower likelihoods are OoD (assigned by a Likelihood Model), but the intuition that In-Distribution (ID) inputs should have highest likelihoods _does not hold in higher dimension_. The authors propose to use the Watanabe-Akaike Information Criterion (WAIC) to circumvent this problem and empirically show the robustness of the approach.

### Counterintuitive Properties of Likelihood Models:
https://i.imgur.com/4vo0Ff5.png
So a GLOW model with Gaussian prior maps SVHN closer to the origin than Cifar (but never actually generates SVHN because Gaussian samples are on the shell). This is bad news for OoD detection.

### Proposed Methodology:
Use the WAIC criterion for OoD detection which gives an asymptotically correct estimate of the gap between the training set and test set expectations:
https://i.imgur.com/vasSxuk.png
Basically,  the correction term subtracts the variance in likelihoods across independent samples from the posterior. This acts to robustify the estimate, ensuring that points that are sensitive to the particular choice of posterior are penalized. They use an ensemble of generative models as a proxy for posterior samples i.e. the ensembles acts as approximate posterior samples.
Now, OoD can be detected with a Likelihood Model:
https://i.imgur.com/M3CDKOA.png
### Discussion
Interestingly, GLOW maps Cifar and other datasets INSIDE the gaussian shell (which is an annulus of radius $\sqrt{dim} = \sqrt{3072} \approx 55.4$
https://i.imgur.com/ERdgOaz.png
This is in itself quite disturbing, as it suggests that better flow-based generative models (for sampling) can be obtained by encouraging the training distribution to overlap better with the typical set in latent
space.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private