[link]
## Summary In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network. ## Architecture An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2). https://i.imgur.com/vtKq9g0.png ## Training The network outputs the prob $p$ and the confidence $c$ which is a single scalar. The modified prob $p'=c*p+(1c)y$ where $y$ is the label (hint). The confidence loss is $\mathcal{L}_c=\log c$, the NLL is $\mathcal{L}_t= \sum \log(p'_i)y_i$. ### Budget Parameter The authors introduced the confidence loss weight $\lambda$ and a budget $\beta$. If $\mathcal{L}_c>\beta$, increase $\lambda$, if $\mathcal{L}_c<\beta$, decrease $\lambda$. $\beta$ is found reasonable in [0.1,1.0]. ### Hinting with 50% Sometimes the model relies on the free label ($c=0$) and does not fit the complicated structure of data. The authors give hints with 50% so the model cannot rely 100% on the hint. They used $p'$ for only half of the bathes for each epoch. ### Misclassified Examples A highcapacity network with small dataset overfits well, and misclassified samples are required to learn the confidence. The network likely assigns low confidence to samples. The paper used an aggressive data augmentation to create difficult examples. ## Inference Reject if $c\le\delta$. For outofdistribution detection, they used the same input perturbation as in ODIN (2018). ODIN used temperature scailing and used the max prob, while this paper does not need temperature scailing since it directly outputs $c$. In evaluation, this paper outperformed ODIN. ## Reference ODIN: [Enhancing The Reliability of Outofdistribution Image Detection in Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaro)
Your comment:
