Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Paper summary Tao et al. propose Attacks Meet Interpretability, an adversarial example detection scheme based on the interpretability of individual neurons. In the context of face recognition, in a first step, the authors identify neurons that correspond to specific face attributes. This is achieved by constructing sets of images were only specific attributes change, and then investigating the firing neurons. In a second step, all other neurons, i.e., neurons not corresponding to any meaningful face attribute, are weakened in order to improve robustness against adversarial examples. The idea is that adversarial examples make use of these non-interpretable neurons to fool the networks. Unfortunately, this defense has been shown not to be effective in [1]. [1] Nicholas Carlini. Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples? ArXiv.org, abs/1902.02322, 2019. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
papers.nips.cc
scholar.google.com
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Tao, Guanhong and Ma, Shiqing and Liu, Yingqi and Zhang, Xiangyu
Neural Information Processing Systems Conference - 2018 via Local Bibsonomy
Keywords: dblp


[link]
Summary by David Stutz 5 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and