Understanding Black-box Predictions via Influence Functions on ShortScience.org

proceedings.mlr.press
scholar.google.com

Understanding Black-box Predictions via Influence Functions
Koh, Pang Wei and Liang, Percy
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by kangcheng 5 years ago

**Goal**: identifying training points most responsible for a given prediction.

Given training points $z_1, \dots, z_n$, let loss function be $\frac{1}{n}\sum_{i=1}^nL(z_i, \theta)$ 

A function called influence function let us compute the parameter change if $z$ were upweighted by some small $\epsilon$. 
$$\hat{\theta}_{\epsilon, z} := \arg \min_{\theta \in \Theta} \frac{1}{n}\sum_{i=1}^n L(z_i, \theta) + \epsilon L(z, \theta)$$

$$\mathcal{I}_{\text{up, params}}(z) := \frac{d\hat{\theta}_{\epsilon, z}}{d\epsilon} = -H_{\hat{\theta}}^{-1} \nabla_\theta L(z, \hat{\theta})$$

$\mathcal{I}_{\text{up, params}}(z)$ shows how uplifting one point $z$ affect the estimate of the parameters $\theta$. 

Furthermore, we could determine how uplifting $z$ affect the loss estimate of a test point through chain rule. 
$$\mathcal{I}_{\text{up, loss}}(z, z_{\text{test}}) = \nabla_\theta L(z_{\text{test}}, \hat{\theta})^\top \mathcal{I}_{\text{up, params}}(z)$$ 

Apart from lifting one training point, change of the parameters with the change of a training point could also be estimated. 
$$\frac{d\hat{\theta}_{\epsilon, z_\delta, -z}}{d\epsilon} = \mathcal{I}_{\text{up, params}}(z_\delta) - \mathcal{I}_{\text{up, params}}(z)$$
This measures how purturbation $\delta$ to training point $z$ affect the parameter estimation $\theta$.

Section 3 describes some practicals about efficient implementing.

This set of tool could be used for some interpretable machine learning tasks.

Your comment: