Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Paper summary Kim et al. propose Concept Activation Vectors (CAV) that represent the direction of features corresponding to specific human-interpretable concepts. In particular, given a network for a classification task, a concept is defined as a set of images with that concept. A linear classifier is then trained to distinguish images with concept from random images without the concept based on a chosen feature layer. The normal of the obtained linear classification boundary corresponds to the learned Concept Activation Vector (CAV). By considering the directional derivative along this direction for a given input allows to quantify how well the input aligns with the chosen concept. This way, images can be ranked and the model’ sensitivity to particular concepts can be quantified. The idea is also illustrated in Figure 1. https://i.imgur.com/KOqPeag.png Figure 1: Process of constructing Concept Activation Vectors (CAVs). Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
arxiv-sanity.com
scholar.google.com
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Been Kim and Martin Wattenberg and Justin Gilmer and Carrie Cai and James Wexler and Fernanda Viegas and Rory Sayres
arXiv e-Print archive - 2017 via Local arXiv
Keywords: stat.ML

more

[link]
Summary by David Stutz 2 weeks ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and