The Promise and Peril of Human Evaluation for Model Interpretability on ShortScience.org

arxiv.org
scholar.google.com

The Promise and Peril of Human Evaluation for Model Interpretability
Bernease Herman
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.AI, cs.LG, stat.ML
more

Summaries/Notes 1

[link] Summary by Apoorva Shetty 4 years ago

Model Interpretability aims at explaining the inner workings of a model promoting transparency of any decisions made by the model, however for the sake of human acceptance or understanding, these explanations seem to be more geared toward human trust than remaining faithful to the model.

**Idea**
There is a distinct difference and tradeoff between persuasive and descriptive Interpretations of a model, one promotes human trust while the other stays truthful to the model. Promoting the former can lead to a loss in transparency of the model.

**Questions to be answered:**
- How do we balance between a persuasive strategy and a descriptive strategy?
- How do we combat human cognitive bias?

**Solutions:**
- *Separating the descriptive and persuasive steps: *
    - We first generate a descriptive explanation, without trying to simplify it
    - In our final steps we add persuasiveness to this explanation to make it more understandable
- *Explicit inclusion of cognitive features:*
    - We would include attributes that affect our functional measures of interpretability to our objective function.
    - This approach has some drawbacks however:
        - we would need to map the knowledge of the user which is an expensive process.
        - Any features that we fail to add to the objective function would add to the human cognitive bias
        - Increased complexity in optimizing of a multi-objective loss function.



**Important terms:**
- *Explanation Strategy*: An explanation strategy is defined as an explanation vehicle coupled with the objective function, constraints, and hyper parameters required to generate a model explanation
- *Explanation model*: An explanation model is defined as the implementation of an explanation strategy, which is fit to a model that is to be interpreted.
- *Human Cognitive Bias*: if an explanation model is highly persuasive or tuned toward human trust as opposed to staying true to the model, the overall evaluation of this explanation would be highly biased compared to a descriptive model. This bias can lead from commonalities between human users across a domain, expertise of the application, or the expectation of a model explanation. Such bias is known as implicit human cognitive bias. 
- *Persuasive Explanation Strategy*: A persuasive explanation strategy aim at convincing a user/ humanizing a model so that the user feels more comfortable with the decisions generated by the model. Fidelity or truthfulness to the model in such a strategy can be very low, which can lead to ethical dilemmas as to where to draw the line between being persuasive and being descriptive. Persuasive strategies do promote human understanding and cognition, which are important aspects of interpretability, however they fail to address the certain other aspects such as fidelity to the model.
- *Descriptive Explanation Strategy*: A descriptive explanation strategy stays true to the underlying model, and generates explanations with maximum fidelity to the model. Ideally such a strategy would describe exactly what the inner working of the underlying model is, which is the main purpose of model interpretation in terms of better understanding the actual workings of the model.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private