Intriguing properties of neural networksIntriguing properties of neural networksChristian Szegedy and Wojciech Zaremba and Ilya Sutskever and Joan Bruna and Dumitru Erhan and Ian Goodfellow and Rob Fergus2013
Paper summaryabhshkdzThe paper introduces two key properties of deep neural networks:
- Semantic meaning of individual units.
- Earlier works analyzed learnt semantics by finding images that maximally activate individual units.
- Authors observe that there is no difference between individual units and random linear combinations of units.
- It is the entire space of activations that contains the bulk of semantic information.
- Stability of neural networks to small perturbations in input space.
- Networks that generalize well are expected to be robust to small perturbations in the input, i.e. imperceptible noise in the input shouldn't change the predicted class.
- Authors find that networks can be made to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error.
- These 'adversarial examples' generalize well to different architectures trained on different data subsets.
- The authors propose a way to make networks more robust to small perturbations by training them with adversarial examples in an adaptive manner, i.e. keep changing the pool of adversarial examples during training. In this regard, they draw a connection with hard-negative mining, and a network trained with adversarial examples performs better than others.
- Formal description of how to generate adversarial examples and mathematical analysis of a network's stability to perturbations are useful studies.
## Weaknesses / Notes
- Two images that are visually indistinguishable to humans but classified differently by the network is indeed an intriguing observation.
- The paper feels a little half-baked in parts, and some ideas could've been presented more clearly.
Intriguing properties of neural networks
arXiv e-Print archive - 2013 via Local arXiv
cs.CV, cs.LG, cs.NE
First published: 2013/12/21 (6 years ago) Abstract: Deep neural networks are highly expressive models that have recently achieved
state of the art performance on speech and visual recognition tasks. While
their expressiveness is the reason they succeed, it also causes them to learn
uninterpretable solutions that could have counter-intuitive properties. In this
paper we report two such properties.
First, we find that there is no distinction between individual high level
units and random linear combinations of high level units, according to various
methods of unit analysis. It suggests that it is the space, rather than the
individual units, that contains of the semantic information in the high layers
of neural networks.
Second, we find that deep neural networks learn input-output mappings that
are fairly discontinuous to a significant extend. We can cause the network to
misclassify an image by applying a certain imperceptible perturbation, which is
found by maximizing the network's prediction error. In addition, the specific
nature of these perturbations is not a random artifact of learning: the same
perturbation can cause a different network, that was trained on a different
subset of the dataset, to misclassify the same input.
Szegedy et al. were (to the best of my knowledge) the first to describe the phenomen of adversarial examples as researched today. Specifically, they described the main objective in order to obtain adversarial examples as
$\arg\min_r \|r\|_2$ s.t. $f(x+r)=l$ and $x+r$ being a valid image
where $f$ is the neural network and $l$ the target class (i.e. targeted adversarial example). In the paper, they originally headlined the section by “blind spots in neural networks”. While they give some explanation and provide experiments, also introducing the notion of transferability of adversarial examples and an idea of adversarial examples used as regularization during training, many questions are left open. The given conclusion, that these adversarial examples are highly unlikely and that these examples lie dense within regular training examples are controversial in the literature.