Towards Deep Learning Models Resistant to Adversarial Attacks Towards Deep Learning Models Resistant to Adversarial Attacks
Paper summary Madry et al. provide an interpretation of training on adversarial examples as sattle-point (i.e. min-max) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR-10 supporting the following conclusions: - Projected gradient descent might be “strongest” adversary using first-order information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsilon$-ball around the samples). This observation is based on a large number of random restarts used for projected gradient descent. Regarding the number of restarts, the authors also note that an adversary should be bounded regarding the computation resources – similar to polynomially bounded adversaries in cryptography. - Network capacity plays an important role in training robust neural networks using the min-max formulation (i.e. using adversarial training). In particular, the authors suggest that increased capacity is needed to fit/learn adversarial examples without overfitting. Additionally, increased capacity (in combination with a strong adversary) decreases transferability of adversarial examples. Also view this summary at [](
Towards Deep Learning Models Resistant to Adversarial Attacks
Aleksander Madry and Aleksandar Makelov and Ludwig Schmidt and Dimitris Tsipras and Adrian Vladu
arXiv e-Print archive - 2017 via Local arXiv
Keywords: stat.ML, cs.LG, cs.NE


Summary by David Stutz 2 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and