Towards Deep Learning Models Resistant to Adversarial Attacks on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry and Aleksandar Makelov and Ludwig Schmidt and Dimitris Tsipras and Adrian Vladu
arXiv e-Print archive - 2017 via Local arXiv
Keywords: stat.ML, cs.LG, cs.NE
more

Summaries/Notes 1

[link] Summary by David Stutz 5 years ago

Madry et al. provide an interpretation of training on adversarial examples as sattle-point (i.e. min-max) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR-10 supporting the following conclusions:
- Projected gradient descent might be “strongest” adversary using first-order information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsilon$-ball around the samples). This observation is based on a large number of random restarts used for projected gradient descent. Regarding the number of restarts, the authors also note that an adversary should be bounded regarding the computation resources – similar to polynomially bounded adversaries in cryptography.
- Network capacity plays an important role in training robust neural networks using the min-max formulation (i.e. using adversarial training). In particular, the authors suggest that increased capacity is needed to fit/learn adversarial examples without overfitting. Additionally, increased capacity (in combination with a strong adversary) decreases transferability of adversarial examples.

Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private