Interpretation of Neural Networks is Fragile Interpretation of Neural Networks is Fragile
Paper summary Ghorbani et al. Show that neural network visualization techniques, often introduced to improve interpretability, are susceptible to adversarial examples. For example, they consider common feature-importance visualization techniques and aim to find an advesarial example that does not change the predicted label but the original interpretation – e.g., as measured on some of the most important features. Examples of the so-called top-1000 attack where the 1000 most important features are changed during the attack are shown in Figure 1. The general finding, i.e., that interpretations are not robust or reliable, is definitely of relevance for the general acceptance and security of deep learning systems in practice. Figure 1: Examples of changed interpretations. Also find this summary at [](
Interpretation of Neural Networks is Fragile
Ghorbani, Amirata and Abid, Abubakar and Zou, James Y.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

Summary by David Stutz 3 months ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and