First published: 2017/02/28 (2 years ago) Abstract: As machine learning systems become ubiquitous, there has been a surge of
interest in interpretable machine learning: systems that provide explanation
for their outputs. These explanations are often used to qualitatively assess
other criteria such as safety or non-discrimination. However, despite the
interest in interpretability, there is very little consensus on what
interpretable machine learning is and how it should be measured. In this
position paper, we first define interpretability and describe when
interpretability is needed (and when it is not). Next, we suggest a taxonomy
for rigorous evaluation and expose open questions towards a more rigorous
science of interpretable machine learning.
For a machine learning model to be trusted/ used one would need to be confident in its capabilities of dealing with all possible scenarios. To that end, designing unit test cases for more complex and global problems could be costly and bordering on impossible to create.
**Idea**: We need a basic guideline that researchers and developers can adhere to when defining problems and outlining solutions, so that model interpretability can be defined accurately in terms of the problem statement.
**Solution**: This paper outlines the basics of machine learning interpretability, what that means for different users, and how to classify these into understandable categories that can be evaluated. This paper highlights the need for interpretability, which arises from *incompleteness*,either of the problem statement, or the problem domain knowledge. This paper provides three main categories to evaluating a model/ providing interpretations:
- *Application Grounded Evaluation*: These evaluations are more costly, and involve real humans evaluating real tasks that a model would take up. Domain knowledge is necessary for the humans evaluating the real task handled by the model.
- *Human Grounded Evaluation:* these evaluations are simpler than application grounded, as they simplify the complex task and have humans evaluate the simplified task. Domain knowledge is not necessary in such an evaluation.
- *Functionally Grounded Evaluation:* No humans are involved in this version of evaluation, here previously evaluated models are perfected or tweaked to optimize certain functionality. Explanation quality is measured by a formal definition of interpretability.
This paper also outlines certain issues with the above three evaluation processes, there are certain questions that need answering before we can pick an evaluation method and metric.
-To highlight the factors of interpretability, we are provided with the Data-driven approach. Here we analyze each task and the various methods used to fulfill the task and see which of these methods and tasks are most significant to the model.
- We are introduced to the term latent dimensions of interpretability, i.e. dimensions that are inferred not observed. These are divided into task related latent dimensions and method related latent dimensions, these are a long list of factors that are task specific or method specific.
Thus this paper provides a basic taxonomy for how we should evaluate our model, and how these evaluations differ from problem to problem. The ideal scenario outlined is that researchers provide the relevant information to evaluate their proposition correctly (correctly in terms of the domain and the problem scope).
First published: 2017/11/20 (1 year ago) Abstract: Transparency, user trust, and human comprehension are popular ethical
motivations for interpretable machine learning. In support of these goals,
researchers evaluate model explanation performance using humans and real world
applications. This alone presents a challenge in many areas of artificial
intelligence. In this position paper, we propose a distinction between
descriptive and persuasive explanations. We discuss reasoning suggesting that
functional interpretability may be correlated with cognitive function and user
preferences. If this is indeed the case, evaluation and optimization using
functional metrics could perpetuate implicit cognitive bias in explanations
that threaten transparency. Finally, we propose two potential research
directions to disambiguate cognitive function and explanation models, retaining
control over the tradeoff between accuracy and interpretability.
Model Interpretability aims at explaining the inner workings of a model promoting transparency of any decisions made by the model, however for the sake of human acceptance or understanding, these explanations seem to be more geared toward human trust than remaining faithful to the model.
There is a distinct difference and tradeoff between persuasive and descriptive Interpretations of a model, one promotes human trust while the other stays truthful to the model. Promoting the former can lead to a loss in transparency of the model.
**Questions to be answered:**
- How do we balance between a persuasive strategy and a descriptive strategy?
- How do we combat human cognitive bias?
- *Separating the descriptive and persuasive steps: *
- We first generate a descriptive explanation, without trying to simplify it
- In our final steps we add persuasiveness to this explanation to make it more understandable
- *Explicit inclusion of cognitive features:*
- We would include attributes that affect our functional measures of interpretability to our objective function.
- This approach has some drawbacks however:
- we would need to map the knowledge of the user which is an expensive process.
- Any features that we fail to add to the objective function would add to the human cognitive bias
- Increased complexity in optimizing of a multi-objective loss function.
- *Explanation Strategy*: An explanation strategy is defined as an explanation vehicle coupled with the objective function, constraints, and hyper parameters required to generate a model explanation
- *Explanation model*: An explanation model is defined as the implementation of an explanation strategy, which is fit to a model that is to be interpreted.
- *Human Cognitive Bias*: if an explanation model is highly persuasive or tuned toward human trust as opposed to staying true to the model, the overall evaluation of this explanation would be highly biased compared to a descriptive model. This bias can lead from commonalities between human users across a domain, expertise of the application, or the expectation of a model explanation. Such bias is known as implicit human cognitive bias.
- *Persuasive Explanation Strategy*: A persuasive explanation strategy aim at convincing a user/ humanizing a model so that the user feels more comfortable with the decisions generated by the model. Fidelity or truthfulness to the model in such a strategy can be very low, which can lead to ethical dilemmas as to where to draw the line between being persuasive and being descriptive. Persuasive strategies do promote human understanding and cognition, which are important aspects of interpretability, however they fail to address the certain other aspects such as fidelity to the model.
- *Descriptive Explanation Strategy*: A descriptive explanation strategy stays true to the underlying model, and generates explanations with maximum fidelity to the model. Ideally such a strategy would describe exactly what the inner working of the underlying model is, which is the main purpose of model interpretation in terms of better understanding the actual workings of the model.
With growing use of ML and AI solutions to complex problems, there is a rise in need for understanding and explaining these models appropriately however these explanations vary in how well they adhere to the model/ explain the decisions in a human understandable way.
**Idea** : There is no standard method of categorizing interpretation methods/ explanations, and no good working practices in the field of interpretability.
**Solution** : This paper explores and categorizes different approaches to interpreting machine learning models. The three main categories this paper proposes are:
- Processing: interpretation approach that uses surrogate models to explain complex models
- Representation: interpretation approach that analyzes intermediate data representations in models with transferability of data/ layers
- Explaining Producing: interpretation approach in which the trained model as part of it's processing also generates an explanation for its process.
In this paper we see different approaches to interpretation in detail, analyzing what the major component is to the interpretation, And which proposed category the explanation method would fall under. The paper goes into detail about other research papers that also deal with categorizing or exploring explanations, and the overall meaning of explainability in other domains.
This paper also touches on how "completeness" (defined as how close the explanation is to the underlying model) and "interpretation" (defined as how easily humans can understand/ trust the model) do have tradeoffs, the author argues that these tradeoffs not only exist in the final explanation, but within each category the definition of completeness would be different and the metric used to measure this would change, which makes sense when you think that different users have different viewpoints on how a model should behave, and what the desired explanation for a result is.
Model interpretations must be true to the model but must also promote human understanding of the working of the model. To this end we would need an interpretability model that balances the two.
**Idea** : Although there exist model interpretations that balance fidelity and human cognition on a local level specific to an underlying model, there are no global model agnostic interpretation models that can achieve the same.
- Break up each aspect of the underlying model into distinct compact decision sets that have no overlap to generate explanations that are faithful to the model, and also cover all possible feature spaces of the model.
- How the solution dealt with:
- *Fidelity* (staying true to the model): the labels in the approximation match that of the underlying model.
- *Unambiguity* (single clear decision): compact decision sets in every feature space ensures unambiguity in the label assigned to it.
- *Interpretability* (Understandable by humans): Intuitive rule based representation, with limited number of rules and predicates.
- *Interactivity* (Allow user to focus on specific feature spaces): Each feature space is divided into distinct compact sets, allowing users to focus on their area of interest.
- Details on a “decision set”:
- Each decision set is a two-level decision (a nested if-then decision set), where the outer if-then clause specifies the sub-space, and the inner if-then clause specifies the logic of assigning a label by the model.
- A default set is defined to assign labels that do not satisfy any of the two-level decisions
- The pros of such a model is that we do not need to trace the logic of an assigned label too far, thus less complex than a decision tree which follows a similar if-then structure.
**Mapping fidelity vs interpretability**
- To see how their model handled fidelity vs interpretability, they mapped the rate of agreement (number of times the approximation label of an instance matches the blackbox assigned label) against pre-defined interpretability complexity defining terms such as:
- Number of predicates (sum of width of all decision sets)
- Number of rules (a set of outer decision, inner decision, and classifier label)
- Number of defined neighborhoods (outer if-then decision)
- Their model reached higher agreement rates to other models at lower values for interpretability complexity.