Adversarial Logit Pairing on ShortScience.org

arxiv.org
scholar.google.com

Adversarial Logit Pairing
Kannan, Harini and Kurakin, Alexey and Goodfellow, Ian J.
arXiv e-Print archive - 2018 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by David Stutz 5 years ago

Kannan et al. propose a defense against adversarial examples called adversarial logit pairing where the logits of clean and adversarial example are regularized to be similar. In particular, during adversarial training, they add a regularizer of the form

$\lambda L(f(x), f(x’))$

were $L$ is, for example, the $L_2$ norm and $f(x’)$ the logits corresponding to adversarial example $x’$ (corresponding to clean example $x$). Intuitively, this is a very simple approach – adversarial training itself enforces the classification results of clean and corresponding adversarial examples to be the same and adversarial logit pairing enforces the internal representation, i.e., the logits, to be similar. In theory, this could also be applied to any set of activations within the network. In the paper, they conclude that

“We hypothesize that adversarial logit pairing works well because it provides an additional prior that regularizes the model toward a more accurate understanding of the classes.”

In experiments, they show that this approach slightly outperforms adversarial training alone on SVHN, MNIST as well as ImageNet.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private