Explaining and Harnessing Adversarial Examples on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Explaining and Harnessing Adversarial Examples
Ian J. Goodfellow and Jonathon Shlens and Christian Szegedy
arXiv e-Print archive - 2014 via Local arXiv
Keywords: stat.ML, cs.LG
more

Summaries/Notes 2

[link] Summary by David Stutz 5 years ago

Goodfellow et al. introduce the fast gradient sign method (FGSM) to craft adversarial examples and further provide a possible interpretation of adversarial examples considering linear models. FGSM is a grdient-based, one step method for generating adversarial examples. In particular, letting $J$ be the objective optimized during training and $\epsilon$ be the maximum $\infty$-norm of the adversarial perturbation, FGSM computes
$x' = x + \eta = x + \epsilon \text{sign}(\nabla_x J(x, y))$
where $y$ is the label for sample $x$. The $\text{sign}$ method is applied element-wise here. The applicability of this method is shown in several examples and it is commonly used in related work.

In the remainder of the paper, Goodfellow et al. discuss a linear interpretation of why adversarial examples exist. Specifically, considering the dot product
$w^T x' = w^T x + w^T \eta$
it becomes apparent that the perturbation $\eta$ – although insignificant on a per-pixel level (i.e. smaller than $\epsilon$) – causes the activation of a single neuron to be influence significantly. What is more, this effect is more pronounced the higher the dimensionality of $x$. Additionally, many network architectures today use $\text{ReLU}$ activations, which are essentially linear.

Goodfellow et al. conduct several more experiments; I want to highlight the conclusions of some of them:
- Training on adversarial samples can be seen as regularization. Based on experiments, it is more effective than $L_1$ regularization or adding random noise.
- The direction of the perturbation matters most. Adversarial samples might be transferable as similar models learn similar functions where these directions are, thus, similarly effective.
- Ensembles are not necessarily resistant to perturbations.

Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private