Learning Confidence for Out-of-Distribution Detection in Neural Networks on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Learning Confidence for Out-of-Distribution Detection in Neural Networks
Terrance DeVries and Graham W. Taylor
arXiv e-Print archive - 2018 via Local arXiv
Keywords: stat.ML, cs.LG
more

Summaries/Notes 1

[link] Summary by elbaro 5 years ago

## Summary
In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network.

## Architecture
An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2).

https://i.imgur.com/vtKq9g0.png

## Training
The network outputs the prob $p$ and the confidence $c$ which is a single scalar. The modified prob $p'=c*p+(1-c)y$ where $y$ is the label (hint). The confidence loss is $\mathcal{L}_c=-\log c$, the NLL is $\mathcal{L}_t= -\sum \log(p'_i)y_i$.

### Budget Parameter
The authors introduced the confidence loss weight $\lambda$ and a budget $\beta$. If $\mathcal{L}_c>\beta$, increase $\lambda$, if $\mathcal{L}_c<\beta$, decrease $\lambda$. $\beta$ is found reasonable in [0.1,1.0].

### Hinting with 50%
Sometimes the model relies on the free label ($c=0$) and does not fit the complicated structure of data. The authors give hints with 50% so the model cannot rely 100% on the hint. They used $p'$ for only half of the bathes for each epoch.

### Misclassified Examples

A high-capacity network with small dataset overfits well, and mis-classified samples are required to learn the confidence. The network likely assigns low confidence to samples. The paper used an aggressive data augmentation to create difficult examples.

## Inference
Reject if $c\le\delta$.

For out-of-distribution detection, they used the same input perturbation as in ODIN (2018). ODIN used temperature scailing and used the max prob, while this paper does not need temperature scailing since it directly outputs $c$. In evaluation, this paper outperformed ODIN.




## Reference
ODIN: [Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaro)

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private