Adversarial Logit Pairing
Adversarial Logit Pairing
Kannan, Harini and Kurakin, Alexey and Goodfellow, Ian J.
2018

Paper summary
davidstutz
Kannan et al. propose a defense against adversarial examples called adversarial logit pairing where the logits of clean and adversarial example are regularized to be similar. In particular, during adversarial training, they add a regularizer of the form
$\lambda L(f(x), f(x’))$
were $L$ is, for example, the $L_2$ norm and $f(x’)$ the logits corresponding to adversarial example $x’$ (corresponding to clean example $x$). Intuitively, this is a very simple approach – adversarial training itself enforces the classification results of clean and corresponding adversarial examples to be the same and adversarial logit pairing enforces the internal representation, i.e., the logits, to be similar. In theory, this could also be applied to any set of activations within the network. In the paper, they conclude that
“We hypothesize that adversarial logit pairing works well because it provides an additional prior that regularizes the model toward a more accurate understanding of the classes.”
In experiments, they show that this approach slightly outperforms adversarial training alone on SVHN, MNIST as well as ImageNet.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Adversarial Logit Pairing

Kannan, Harini and Kurakin, Alexey and Goodfellow, Ian J.

arXiv e-Print archive - 2018 via Local Bibsonomy

Keywords: dblp

Kannan, Harini and Kurakin, Alexey and Goodfellow, Ian J.

arXiv e-Print archive - 2018 via Local Bibsonomy

Keywords: dblp

You must log in before you can submit this summary! Your draft will not be saved!

Preview:

About