First published: 2016/02/16 (4 years ago) Abstract: Despite widespread adoption, machine learning models remain mostly black
boxes. Understanding the reasons behind predictions is, however, quite
important in assessing trust, which is fundamental if one plans to take action
based on a prediction, or when choosing whether to deploy a new model. Such
understanding also provides insights into the model, which can be used to
transform an untrustworthy model or prediction into a trustworthy one. In this
work, we propose LIME, a novel explanation technique that explains the
predictions of any classifier in an interpretable and faithful manner, by
learning an interpretable model locally around the prediction. We also propose
a method to explain models by presenting representative individual predictions
and their explanations in a non-redundant way, framing the task as a submodular
optimization problem. We demonstrate the flexibility of these methods by
explaining different models for text (e.g. random forests) and image
classification (e.g. neural networks). We show the utility of explanations via
novel experiments, both simulated and with human subjects, on various scenarios
that require trust: deciding if one should trust a prediction, choosing between
models, improving an untrustworthy classifier, and identifying why a classifier
should not be trusted.
Although Machine learning models have been accepted widely as the next step towards simplifying complex problems, the inner workings of a machine learning model are still unclear and these details can lead to an increase in trust of the model prediction, and the model itself.
**Idea: ** A good explanation system that can justify the prediction of a classifier and can lead to diagnosing the reasoning behind a model can exponentially raise one’s trust in the predictive model.
**Solution: ** This paper proposes a local explanation model called LIME, that approximates a linear local explanation with respect to a data point. The paper outlines desired characteristics for explainers and expounds on how LIME matches to these characteristics, the characteristics being 1) Interpretable 2) Local Fidelity 3) Model-Agnostic and 4) Provides a global perspective. This paper also explores the concept of Fidelity-Interpretability Trade-off; The more complex a model is the less interpretable a completely faithful explanation would be, thus a balance needs to be struck between interpretability and fidelity for complex models. The paper outlines in detail how the proposed LIME explanation model works, for different types of predictive classifiers. LIME works by generating random data points around a test data point and approximating a linear explanation for these randomized points. Thus, LIME works on a rather large assumption that every complex model is linear on a microscopic level. This assumption although large seems justified for most models, although this could lead to certain global issues when analyzing a complex model on the whole.
First published: 2018/10/08 (1 year ago) Abstract: Saliency methods have emerged as a popular tool to highlight features in an
input deemed relevant for the prediction of a learned model. Several saliency
methods have been proposed, often guided by visual appeal on image data. In
this work, we propose an actionable methodology to evaluate what kinds of
explanations a given method can and cannot provide. We find that reliance,
solely, on visual assessment can be misleading. Through extensive experiments
we show that some existing saliency methods are independent both of the model
and of the data generating process. Consequently, methods that fail the
proposed tests are inadequate for tasks that are sensitive to either data or
model, such as, finding outliers in the data, explaining the relationship
between inputs and outputs that the model learned, and debugging the model. We
interpret our findings through an analogy with edge detection in images, a
technique that requires neither training data nor model. Theory in the case of
a linear model and a single-layer convolutional neural network supports our
**Idea:** With the growing use of visual explanation systems of machine learning models such as saliency maps, there needs to be a standardized method of verifying if a saliency method is correctly describing the underlying ML model.
**Solution:** In this paper two Sanity Checks have been proposed to verify the accuracy and the faithfulness of a saliency method:
* *Model parameter randomization test:* In this sanity check the outputs of a saliency method on a trained model is compared to that of the same method on an untrained randomly parameterized model. If these images are similar/identical then this saliency method does not correctly describe the model. In the course of this experiment it is found that certain methods such as the Guided BackProp are constant in their explanations despite alterations in the model.
* *Data Randomization Test:* This method explores the relationship of saliency methods to data and their associated labels. In this test, the labels of the training data are randomized thus there should be no definite pattern describing the model (Since the model is as good as randomly guessing an output label). If there is a definite pattern, this shows that the saliency methods are independent of the underlying model/training data labels. In this test as well Guided BackProp did not fare well, implying this saliency method is as good as an edge detector as opposed to a ML explainer.
Thus this paper makes a valid argument toward having standardized tests that an interpretation model must satisfy to be deemed accurate or faithful.
Tree-based ML models are becoming increasingly popular, but in the explanation space for these type of models is woefully lacking explanations on a local level. Local level explanations can give a clearer picture on specific use-cases and help pin point exact areas where the ML model maybe lacking in accuracy.
**Idea**: We need a local explanation system for trees, that is not based on simple decision path, but rather weighs each feature in comparison to every other feature to gain better insight on the model's inner workings.
**Solution**: This paper outlines a new methodology using SHAP relative values, to weigh pairs of features to get a better local explanation of a tree-based model. The paper also outlines how we can garner global level explanations from several local explanations, using the relative score for a large sample space. The paper also walks us through existing methodologies for local explanation, and why these are biased toward tree depth as opposed to actual feature importance.
The proposed explanation model titled TreeExplainer exposes methods to compute optimal local explanation, garner global understanding from local explanations, and capture feature interaction within a tree based model.
This method assigns Shapley interaction values to pairs of features essentially ranking the features so as to understand which features have a higher impact on overall outcomes, and analyze feature interaction.
First published: 2017/02/28 (3 years ago) Abstract: As machine learning systems become ubiquitous, there has been a surge of
interest in interpretable machine learning: systems that provide explanation
for their outputs. These explanations are often used to qualitatively assess
other criteria such as safety or non-discrimination. However, despite the
interest in interpretability, there is very little consensus on what
interpretable machine learning is and how it should be measured. In this
position paper, we first define interpretability and describe when
interpretability is needed (and when it is not). Next, we suggest a taxonomy
for rigorous evaluation and expose open questions towards a more rigorous
science of interpretable machine learning.
For a machine learning model to be trusted/ used one would need to be confident in its capabilities of dealing with all possible scenarios. To that end, designing unit test cases for more complex and global problems could be costly and bordering on impossible to create.
**Idea**: We need a basic guideline that researchers and developers can adhere to when defining problems and outlining solutions, so that model interpretability can be defined accurately in terms of the problem statement.
**Solution**: This paper outlines the basics of machine learning interpretability, what that means for different users, and how to classify these into understandable categories that can be evaluated. This paper highlights the need for interpretability, which arises from *incompleteness*,either of the problem statement, or the problem domain knowledge. This paper provides three main categories to evaluating a model/ providing interpretations:
- *Application Grounded Evaluation*: These evaluations are more costly, and involve real humans evaluating real tasks that a model would take up. Domain knowledge is necessary for the humans evaluating the real task handled by the model.
- *Human Grounded Evaluation:* these evaluations are simpler than application grounded, as they simplify the complex task and have humans evaluate the simplified task. Domain knowledge is not necessary in such an evaluation.
- *Functionally Grounded Evaluation:* No humans are involved in this version of evaluation, here previously evaluated models are perfected or tweaked to optimize certain functionality. Explanation quality is measured by a formal definition of interpretability.
This paper also outlines certain issues with the above three evaluation processes, there are certain questions that need answering before we can pick an evaluation method and metric.
-To highlight the factors of interpretability, we are provided with the Data-driven approach. Here we analyze each task and the various methods used to fulfill the task and see which of these methods and tasks are most significant to the model.
- We are introduced to the term latent dimensions of interpretability, i.e. dimensions that are inferred not observed. These are divided into task related latent dimensions and method related latent dimensions, these are a long list of factors that are task specific or method specific.
Thus this paper provides a basic taxonomy for how we should evaluate our model, and how these evaluations differ from problem to problem. The ideal scenario outlined is that researchers provide the relevant information to evaluate their proposition correctly (correctly in terms of the domain and the problem scope).
First published: 2017/11/20 (2 years ago) Abstract: Transparency, user trust, and human comprehension are popular ethical
motivations for interpretable machine learning. In support of these goals,
researchers evaluate model explanation performance using humans and real world
applications. This alone presents a challenge in many areas of artificial
intelligence. In this position paper, we propose a distinction between
descriptive and persuasive explanations. We discuss reasoning suggesting that
functional interpretability may be correlated with cognitive function and user
preferences. If this is indeed the case, evaluation and optimization using
functional metrics could perpetuate implicit cognitive bias in explanations
that threaten transparency. Finally, we propose two potential research
directions to disambiguate cognitive function and explanation models, retaining
control over the tradeoff between accuracy and interpretability.
Model Interpretability aims at explaining the inner workings of a model promoting transparency of any decisions made by the model, however for the sake of human acceptance or understanding, these explanations seem to be more geared toward human trust than remaining faithful to the model.
There is a distinct difference and tradeoff between persuasive and descriptive Interpretations of a model, one promotes human trust while the other stays truthful to the model. Promoting the former can lead to a loss in transparency of the model.
**Questions to be answered:**
- How do we balance between a persuasive strategy and a descriptive strategy?
- How do we combat human cognitive bias?
- *Separating the descriptive and persuasive steps: *
- We first generate a descriptive explanation, without trying to simplify it
- In our final steps we add persuasiveness to this explanation to make it more understandable
- *Explicit inclusion of cognitive features:*
- We would include attributes that affect our functional measures of interpretability to our objective function.
- This approach has some drawbacks however:
- we would need to map the knowledge of the user which is an expensive process.
- Any features that we fail to add to the objective function would add to the human cognitive bias
- Increased complexity in optimizing of a multi-objective loss function.
- *Explanation Strategy*: An explanation strategy is defined as an explanation vehicle coupled with the objective function, constraints, and hyper parameters required to generate a model explanation
- *Explanation model*: An explanation model is defined as the implementation of an explanation strategy, which is fit to a model that is to be interpreted.
- *Human Cognitive Bias*: if an explanation model is highly persuasive or tuned toward human trust as opposed to staying true to the model, the overall evaluation of this explanation would be highly biased compared to a descriptive model. This bias can lead from commonalities between human users across a domain, expertise of the application, or the expectation of a model explanation. Such bias is known as implicit human cognitive bias.
- *Persuasive Explanation Strategy*: A persuasive explanation strategy aim at convincing a user/ humanizing a model so that the user feels more comfortable with the decisions generated by the model. Fidelity or truthfulness to the model in such a strategy can be very low, which can lead to ethical dilemmas as to where to draw the line between being persuasive and being descriptive. Persuasive strategies do promote human understanding and cognition, which are important aspects of interpretability, however they fail to address the certain other aspects such as fidelity to the model.
- *Descriptive Explanation Strategy*: A descriptive explanation strategy stays true to the underlying model, and generates explanations with maximum fidelity to the model. Ideally such a strategy would describe exactly what the inner working of the underlying model is, which is the main purpose of model interpretation in terms of better understanding the actual workings of the model.
With growing use of ML and AI solutions to complex problems, there is a rise in need for understanding and explaining these models appropriately however these explanations vary in how well they adhere to the model/ explain the decisions in a human understandable way.
**Idea** : There is no standard method of categorizing interpretation methods/ explanations, and no good working practices in the field of interpretability.
**Solution** : This paper explores and categorizes different approaches to interpreting machine learning models. The three main categories this paper proposes are:
- Processing: interpretation approach that uses surrogate models to explain complex models
- Representation: interpretation approach that analyzes intermediate data representations in models with transferability of data/ layers
- Explaining Producing: interpretation approach in which the trained model as part of it's processing also generates an explanation for its process.
In this paper we see different approaches to interpretation in detail, analyzing what the major component is to the interpretation, And which proposed category the explanation method would fall under. The paper goes into detail about other research papers that also deal with categorizing or exploring explanations, and the overall meaning of explainability in other domains.
This paper also touches on how "completeness" (defined as how close the explanation is to the underlying model) and "interpretation" (defined as how easily humans can understand/ trust the model) do have tradeoffs, the author argues that these tradeoffs not only exist in the final explanation, but within each category the definition of completeness would be different and the metric used to measure this would change, which makes sense when you think that different users have different viewpoints on how a model should behave, and what the desired explanation for a result is.
Model interpretations must be true to the model but must also promote human understanding of the working of the model. To this end we would need an interpretability model that balances the two.
**Idea** : Although there exist model interpretations that balance fidelity and human cognition on a local level specific to an underlying model, there are no global model agnostic interpretation models that can achieve the same.
- Break up each aspect of the underlying model into distinct compact decision sets that have no overlap to generate explanations that are faithful to the model, and also cover all possible feature spaces of the model.
- How the solution dealt with:
- *Fidelity* (staying true to the model): the labels in the approximation match that of the underlying model.
- *Unambiguity* (single clear decision): compact decision sets in every feature space ensures unambiguity in the label assigned to it.
- *Interpretability* (Understandable by humans): Intuitive rule based representation, with limited number of rules and predicates.
- *Interactivity* (Allow user to focus on specific feature spaces): Each feature space is divided into distinct compact sets, allowing users to focus on their area of interest.
- Details on a “decision set”:
- Each decision set is a two-level decision (a nested if-then decision set), where the outer if-then clause specifies the sub-space, and the inner if-then clause specifies the logic of assigning a label by the model.
- A default set is defined to assign labels that do not satisfy any of the two-level decisions
- The pros of such a model is that we do not need to trace the logic of an assigned label too far, thus less complex than a decision tree which follows a similar if-then structure.
**Mapping fidelity vs interpretability**
- To see how their model handled fidelity vs interpretability, they mapped the rate of agreement (number of times the approximation label of an instance matches the blackbox assigned label) against pre-defined interpretability complexity defining terms such as:
- Number of predicates (sum of width of all decision sets)
- Number of rules (a set of outer decision, inner decision, and classifier label)
- Number of defined neighborhoods (outer if-then decision)
- Their model reached higher agreement rates to other models at lower values for interpretability complexity.