Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples
Paper summary Dong et al. study interpretability in the context of adversarial examples and propose a variant of adversarial training to improve interpretability. First the authors argue that neurons do not preserve their interpretability on adversarial examples; e.g., neurons corresponding to high-level concepts such as “bird” or “dog” do not fire consistently on adversarial examples. This result is also validated experimentally, by considering deep representations at different layers. To improve interpretability, the authors propose adversarial training with an additional regularizer enforcing similar features on true and adversarial training examples. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
scholar.google.com
Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples
Dong, Yinpeng and Bao, Fan and Su, Hang and Zhu, Jun
arXiv e-Print archive - 2019 via Local Bibsonomy
Keywords: dblp


[link]
Summary by David Stutz 1 month ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and