Certifying Some Distributional Robustness with Principled Adversarial TrainingCertifying Some Distributional Robustness with Principled Adversarial TrainingAman Sinha and Hongseok Namkoong and John Duchi2017
Paper summaryjanrocketmanA novel method for adversarially-robust learning with theoretical guarantees under small perturbations.
1) Given the default distribution P_0, defines a proximity of it as a set of distributions which are \rho-close to P_0 in terms of Wasserstein metric with a predefined cost function c (e.g. L2);
2) Formulates the robust learning problem as minimization of the worst-case example in the proximity and proposes a Lagrangian relaxation of it;
3) Given it, provides a data-dependent upper bound on the worst-case loss, demonstrates that the problem of finding the worst-case adversarial perturbation, which is generally NP hard, renders to optimization of a concave function if the maximum amount of perturbation \rho is low.
First published: 2017/10/29 (2 years ago) Abstract: Neural networks are vulnerable to adversarial examples and researchers have
proposed many heuristic attack and defense mechanisms. We address this problem
through the principled lens of distributionally robust optimization, which
guarantees performance under adversarial input perturbations. By considering a
Lagrangian penalty formulation of perturbing the underlying data distribution
in a Wasserstein ball, we provide a training procedure that augments model
parameter updates with worst-case perturbations of training data. For smooth
losses, our procedure provably achieves moderate levels of robustness with
little computational or statistical cost relative to empirical risk
minimization. Furthermore, our statistical guarantees allow us to efficiently
certify robustness for the population loss. For imperceptible perturbations,
our method matches or outperforms heuristic approaches.
Sinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each iteration, an attacker crafts an adversarial examples which the network is trained on. In a nutshell, their approach differs from previous ones (apart from the theoretical framework) in the used attacker. Specifically, their attacker optimizes
$\arg\max_z l(\theta, z) - \gamma \|z – z^t\|_p^2$
where $z^t$ is a training sample chosen randomly during training.
On a side note, I also recommend reading the reviews of this paper: https://openreview.net/forum?id=Hk6kPgZA-
Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).