First-order Adversarial Vulnerability of Neural Networks and Input
Dimension
Carl-Johann Simon-Gabriel
and
Yann Ollivier
and
Léon Bottou
and
Bernhard Schölkopf
and
David Lopez-Paz
arXiv e-Print archive - 2018 via arXiv
Keywords:
stat.ML, cs.CV, cs.LG, 68T45, I.2.6
First published: 2018/02/05 (6 years ago) Abstract: Over the past few years, neural networks were proven vulnerable to
adversarial images: targeted but imperceptible image perturbations lead to
drastically different predictions. We show that adversarial vulnerability
increases with the gradients of the training objective when viewed as a
function of the inputs. Surprisingly, vulnerability does not depend on network
topology: for many standard network architectures, we prove that at
initialization, the $\ell_1$-norm of these gradients grows as the square root
of the input dimension, leaving the networks increasingly vulnerable with
growing image size. We empirically show that this dimension dependence persists
after either usual or robust training, but gets attenuated with higher
regularization.
Simon-Gabriel et al. Study the robustness of neural networks with respect to the input dimensionality. Their main hypothesis is that the vulnerability of neural networks against adversarial perturbations increases with the input dimensionality. To support this hypothesis, they provide a theoretical analysis as well as experiments.
The general idea of robustness is that small perturbations $\delta$ of the input $x$ do only result in small variations $\delta \mathcal{L}$ of the loss:
$\delta \mathcal{L} = \max_{\|\delta\| \leq \epsilon} |\mathcal{L}(x + \delta) - \mathcal{L}(x)| \approx \max_{\|\delta\| \leq \epsilon} |\partial_x \mathcal{L} \cdot \delta| = \epsilon \||\partial_x \mathcal{L}\||$
where the approximation is due to a first-order Taylor expansion and $\||\cdot\||$ is the dual norm of $\|\cdot\|$. As a result, the vulnerability of networks can be quantified by considering $\epsilon\mathbb{E}_x\||\partial_x \mathcal{L}\||$. A natural regularizer to increase robustness (i.e. decrease vulnerability) would be $\epsilon \||\partial_x \mathcal{L}\||$ which is a similar regularizer as proposed in [1].
The remainder of the paper studies the norm $\|\partial_x \mathcal{L}\|$ with respect to the input dimension $d$. Specifically, they show that the gradient norm increases monotonically with the input dimension. I refer to the paper for the exact theorems and proofs. This claim is based on the assumption of non-trained networks that have merely been initialized. However, in experiments, they show that the conclusion may hold true in realistic settings, e.g. on ImageNet.
[1] Matthias Hein, Maksym Andriushchenko:
Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. NIPS 2017: 2263-2273
Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).