On the (Statistical) Detection of Adversarial Examples on ShortScience.org

arxiv.org
scholar.google.com

On the (Statistical) Detection of Adversarial Examples
Kathrin Grosse and Praveen Manoharan and Nicolas Papernot and Michael Backes and Patrick McDaniel
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.CR, cs.LG, stat.ML
more

Summaries/Notes 1

[link] Summary by David Stutz 5 years ago

Grosse et al. use statistical tests to detect adversarial examples; additionally, machine learning algorithms are adapted to detect adversarial examples on-the-fly of performing classification. The idea of using statistics tests to detect adversarial examples is simple: assuming that there is a true data distribution, a machine learning algorithm can only approximate this distribution – i.e. each algorithm “learns” an approximate distribution. The ideal adversary uses this discrepancy to draw a sample from the data distribution where data distribution and learned distribution differ – resulting in mis-classification. In practice, they show that kernel-based two-sample statistics hypothesis testing can be used to identify a set of adversarial examples (but not individual one). In order to also detect individual ones, each classifier is augmented to also detect whether the input is an adversarial example. This approach is similar to adversarial training, where adversarial examples are included in the training set with the correct label. However, I believe that it is possible to again craft new examples to the augmented classifier – as is also possible with adversarial training.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private