[link]
This paper describes how to find local interpretable modelagnostic explanations (LIME) why a blackbox model $m_B$ came to a classification decision for one sample $x$. The key idea is to evaluate many more samples around $x$ (local) and fit an interpretable model $m_I$ to it. The way of sampling and the kind of interpretable model depends on the problem domain. For computer vision / image classification, the image $x$ is divided into superpixels. Single superpixels are made black, the new image $x'$ is evaluated $p' = m_B(x')$. This is done multiple times. The paper is also explained in [this YouTube video](https://www.youtube.com/watch?v=KP7JtFMLo4) by Marco Tulio Ribeiro. A very similar idea is already in the [Zeiler & Fergus paper](http://www.shortscience.org/paper?bibtexKey=journals/corr/ZeilerF13#martinthoma). ## Followup Paper * June 2016: [ModelAgnostic Interpretability of Machine Learning](https://arxiv.org/abs/1606.05386) * November 2016: * [Nothing Else Matters: ModelAgnostic Explanations By Identifying Prediction Invariance](https://arxiv.org/abs/1611.05817) * [An unexpected unity among methods for interpreting model predictions](https://arxiv.org/abs/1611.07478)
Your comment:
