Intriguing properties of neural networks on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Intriguing properties of neural networks
Christian Szegedy and Wojciech Zaremba and Ilya Sutskever and Joan Bruna and Dumitru Erhan and Ian Goodfellow and Rob Fergus
arXiv e-Print archive - 2013 via Local arXiv
Keywords: cs.CV, cs.LG, cs.NE
more

Summaries/Notes 3

[link] Summary by David Stutz 5 years ago

Szegedy et al. were (to the best of my knowledge) the first to describe the phenomen of adversarial examples as researched today. Specifically, they described the main objective in order to obtain adversarial examples as

$\arg\min_r \|r\|_2$ s.t. $f(x+r)=l$ and $x+r$ being a valid image

where $f$ is the neural network and $l$ the target class (i.e. targeted adversarial example). In the paper, they originally headlined the section by “blind spots in neural networks”. While they give some explanation and provide experiments, also introducing the notion of transferability of adversarial examples and an idea of adversarial examples used as regularization during training, many questions are left open. The given conclusion, that these adversarial examples are highly unlikely and that these examples lie dense within regular training examples are controversial in the literature.

Your comment: