"Why Should I Trust You?": Explaining the Predictions of Any Classifier"Why Should I Trust You?": Explaining the Predictions of Any ClassifierMarco Tulio Ribeiro and Sameer Singh and Carlos Guestrin2016
Paper summaryapoorvashettyAlthough Machine learning models have been accepted widely as the next step towards simplifying complex problems, the inner workings of a machine learning model are still unclear and these details can lead to an increase in trust of the model prediction, and the model itself.
**Idea: ** A good explanation system that can justify the prediction of a classifier and can lead to diagnosing the reasoning behind a model can exponentially raise one’s trust in the predictive model.
**Solution: ** This paper proposes a local explanation model called LIME, that approximates a linear local explanation with respect to a data point. The paper outlines desired characteristics for explainers and expounds on how LIME matches to these characteristics, the characteristics being 1) Interpretable 2) Local Fidelity 3) Model-Agnostic and 4) Provides a global perspective. This paper also explores the concept of Fidelity-Interpretability Trade-off; The more complex a model is the less interpretable a completely faithful explanation would be, thus a balance needs to be struck between interpretability and fidelity for complex models. The paper outlines in detail how the proposed LIME explanation model works, for different types of predictive classifiers. LIME works by generating random data points around a test data point and approximating a linear explanation for these randomized points. Thus, LIME works on a rather large assumption that every complex model is linear on a microscopic level. This assumption although large seems justified for most models, although this could lead to certain global issues when analyzing a complex model on the whole.
First published: 2016/02/16 (4 years ago) Abstract: Despite widespread adoption, machine learning models remain mostly black
boxes. Understanding the reasons behind predictions is, however, quite
important in assessing trust, which is fundamental if one plans to take action
based on a prediction, or when choosing whether to deploy a new model. Such
understanding also provides insights into the model, which can be used to
transform an untrustworthy model or prediction into a trustworthy one. In this
work, we propose LIME, a novel explanation technique that explains the
predictions of any classifier in an interpretable and faithful manner, by
learning an interpretable model locally around the prediction. We also propose
a method to explain models by presenting representative individual predictions
and their explanations in a non-redundant way, framing the task as a submodular
optimization problem. We demonstrate the flexibility of these methods by
explaining different models for text (e.g. random forests) and image
classification (e.g. neural networks). We show the utility of explanations via
novel experiments, both simulated and with human subjects, on various scenarios
that require trust: deciding if one should trust a prediction, choosing between
models, improving an untrustworthy classifier, and identifying why a classifier
should not be trusted.
This paper describes how to find local interpretable model-agnostic explanations (LIME) why a black-box model $m_B$ came to a classification decision for one sample $x$. The key idea is to evaluate many more samples around $x$ (local) and fit an interpretable model $m_I$ to it. The way of sampling and the kind of interpretable model depends on the problem domain.
For computer vision / image classification, the image $x$ is divided into superpixels. Single super-pixels are made black, the new image $x'$ is evaluated $p' = m_B(x')$. This is done multiple times.
The paper is also explained in [this YouTube video](https://www.youtube.com/watch?v=KP7-JtFMLo4) by Marco Tulio Ribeiro.
A very similar idea is already in the [Zeiler & Fergus paper](http://www.shortscience.org/paper?bibtexKey=journals/corr/ZeilerF13#martinthoma).
## Follow-up Paper
* June 2016: [Model-Agnostic Interpretability of Machine Learning](https://arxiv.org/abs/1606.05386)
* November 2016:
* [Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance](https://arxiv.org/abs/1611.05817)
* [An unexpected unity among methods for interpreting