A Boundary Tilting Persepective on the Phenomenon of Adversarial ExamplesA Boundary Tilting Persepective on the Phenomenon of Adversarial ExamplesThomas Tanay and Lewis Griffin2016
Paper summarydavidstutzTanay and Griffin introduce the boundary tilting perspective as alternative to the “linear explanation” for adversarial examples. Specifically, they argue that it is not reasonable to assume that the linearity in deep neural networks causes the existence of adversarial examples. Originally, Goodfellow et al. [1] explained the impact of adversarial examples by considering a linear classifier:
$w^T x' = w^Tx + w^T\eta$
where $\eta$ is the adversarial perturbations. In large dimensions, the second term might result in a significant shift of the neuron's activation. Tanay and Griffin, in contrast, argue that the dimensionality does not have an impact; althought he impact of $w^T\eta$ grows with the dimensionality, so does $w^Tx$, such that the ratio should be preserved. Additionally, they showed (by giving a counter-example) that linearity is not sufficient for the existence of adversarial examples.
Instead, they offer a different perspective on the existence of adversarial examples that is, in the course of the paper, formalized. Their main idea is that the training samples live on a manifold in the actual input space. The claim is, that on the manifold there are no adversarial examples (meaning that the classes are well separated on the manifold and it is hard to find adversarial examples for most training samples). However, the decision boundary extends beyond the manifold and might lie close to the manifold such that adversarial examples leaving the manifold can be found easily. This idea is illustrated in Figure 1.
https://i.imgur.com/SrviKgm.png
Figure 1: Illustration of the underlying idea of the boundary tilting perspective, see the text for details.
[1] Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy:
Explaining and Harnessing Adversarial Examples. CoRR abs/1412.6572 (2014)
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
First published: 2016/08/27 (2 years ago) Abstract: Deep neural networks have been shown to suffer from a surprising weakness:
their classification outputs can be changed by small, non-random perturbations
of their inputs. This adversarial example phenomenon has been explained as
originating from deep networks being "too linear" (Goodfellow et al., 2014). We
show here that the linear explanation of adversarial examples presents a number
of limitations: the formal argument is not convincing, linear classifiers do
not always suffer from the phenomenon, and when they do their adversarial
examples are different from the ones affecting deep networks.
We propose a new perspective on the phenomenon. We argue that adversarial
examples exist when the classification boundary lies close to the submanifold
of sampled data, and present a mathematical analysis of this new perspective in
the linear case. We define the notion of adversarial strength and show that it
can be reduced to the deviation angle between the classifier considered and the
nearest centroid classifier. Then, we show that the adversarial strength can be
made arbitrarily high independently of the classification performance due to a
mechanism that we call boundary tilting. This result leads us to defining a new
taxonomy of adversarial examples. Finally, we show that the adversarial
strength observed in practice is directly dependent on the level of
regularisation used and the strongest adversarial examples, symptomatic of
overfitting, can be avoided by using a proper level of regularisation.
Tanay and Griffin introduce the boundary tilting perspective as alternative to the “linear explanation” for adversarial examples. Specifically, they argue that it is not reasonable to assume that the linearity in deep neural networks causes the existence of adversarial examples. Originally, Goodfellow et al. [1] explained the impact of adversarial examples by considering a linear classifier:
$w^T x' = w^Tx + w^T\eta$
where $\eta$ is the adversarial perturbations. In large dimensions, the second term might result in a significant shift of the neuron's activation. Tanay and Griffin, in contrast, argue that the dimensionality does not have an impact; althought he impact of $w^T\eta$ grows with the dimensionality, so does $w^Tx$, such that the ratio should be preserved. Additionally, they showed (by giving a counter-example) that linearity is not sufficient for the existence of adversarial examples.
Instead, they offer a different perspective on the existence of adversarial examples that is, in the course of the paper, formalized. Their main idea is that the training samples live on a manifold in the actual input space. The claim is, that on the manifold there are no adversarial examples (meaning that the classes are well separated on the manifold and it is hard to find adversarial examples for most training samples). However, the decision boundary extends beyond the manifold and might lie close to the manifold such that adversarial examples leaving the manifold can be found easily. This idea is illustrated in Figure 1.
https://i.imgur.com/SrviKgm.png
Figure 1: Illustration of the underlying idea of the boundary tilting perspective, see the text for details.
[1] Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy:
Explaining and Harnessing Adversarial Examples. CoRR abs/1412.6572 (2014)
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).