Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep LearningDeep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep LearningNicolas Papernot and Patrick McDaniel2018
Paper summarydavidstutzPapernot and McDaniel introduce deep k-nearest neighbors where nearest neighbors are found at each intermediate layer in order to improve interpretbaility and robustness. Personally, I really appreciated reading this paper; thus, I will not only discuss the actually proposed method but also highlight some ideas from their thorough survey and experimental results.
First, Papernot and McDaniel provide a quite thorough survey of relevant work in three disciplines: confidence, interpretability and robustness. To the best of my knowledge, this is one of few papers that explicitly make the connection of these three disciplines. Especially the work on confidence is interesting in the light of robustness as Papernot and McDaniel also frequently distinguish between in-distribution and out-distribution samples. Here, it is commonly known that deep neural networks are over-confidence when moving away from the data distribution.
The deep k-nearest neighbor approach is described in Algorithm 1 and summarized in the following. For a trained model and a training set of labeled samples, they first find k nearest neighbors for each intermediate layer of the network. The layer nonconformity with a specific label $j$, referred to as $\alpha$ in Algorithm 1, is computed as the number of labels that in the set of nearest neighbors that do not share this label. By comparing these nonconformity values to a set of reference values (computing over a set of labeled calibration data), the prediction can be refined. In particular, the probability for label $j$ can be computed as the fraction of reference nonconformity values that are higher than the computed one. See Algorthm 1 or the paper for details.
Algorithm 1: The deep k-nearest neighbor algorithm and an illustration.
Finally, they provide experimental results – again considering the three disciplines of confidence/credibility, interpretability and robustness. The main take-aways are that the resulting confidences are more reliable on out-of-distribution samples, which also include adversarial examples. Additioanlly, the nearest neighbor allow very basic interpretation of the predictions.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
First published: 2018/03/13 (1 year ago) Abstract: Deep neural networks (DNNs) enable innovative applications of machine
learning like image recognition, machine translation, or malware detection.
However, deep learning is often criticized for its lack of robustness in
adversarial settings (e.g., vulnerability to adversarial inputs) and general
inability to rationalize its predictions. In this work, we exploit the
structure of deep learning to enable new learning-based inference and decision
strategies that achieve desirable properties such as robustness and
interpretability. We take a first step in this direction and introduce the Deep
k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest
neighbors algorithm with representations of the data learned by each layer of
the DNN: a test input is compared to its neighboring training points according
to the distance that separates them in the representations. We show the labels
of these neighboring points afford confidence estimates for inputs outside the
model's training manifold, including on malicious inputs like adversarial
examples--and therein provides protections against inputs that are outside the
models understanding. This is because the nearest neighbors can be used to
estimate the nonconformity of, i.e., the lack of support for, a prediction in
the training data. The neighbors also constitute human-interpretable
explanations of predictions. We evaluate the DkNN algorithm on several
datasets, and show the confidence estimates accurately identify inputs outside
the model, and that the explanations provided by nearest neighbors are
intuitive and useful in understanding model failures.