elbaro's profile - ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Learning Confidence for Out-of-Distribution Detection in Neural Networks
Terrance DeVries and Graham W. Taylor
arXiv e-Print archive - 2018 via Local arXiv
Keywords: stat.ML, cs.LG
more

[link] Summary by elbaro 5 years ago

## Summary
In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network.

## Architecture
An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2).

https://i.imgur.com/vtKq9g0.png

## Training
The network outputs the prob $p$ and the confidence $c$ which is a single scalar. The modified prob $p'=c*p+(1-c)y$ where $y$ is the label (hint). The confidence loss is $\mathcal{L}_c=-\log c$, the NLL is $\mathcal{L}_t= -\sum \log(p'_i)y_i$.

### Budget Parameter
The authors introduced the confidence loss weight $\lambda$ and a budget $\beta$. If $\mathcal{L}_c>\beta$, increase $\lambda$, if $\mathcal{L}_c<\beta$, decrease $\lambda$. $\beta$ is found reasonable in [0.1,1.0].

### Hinting with 50%
Sometimes the model relies on the free label ($c=0$) and does not fit the complicated structure of data. The authors give hints with 50% so the model cannot rely 100% on the hint. They used $p'$ for only half of the bathes for each epoch.

### Misclassified Examples

A high-capacity network with small dataset overfits well, and mis-classified samples are required to learn the confidence. The network likely assigns low confidence to samples. The paper used an aggressive data augmentation to create difficult examples.

## Inference
Reject if $c\le\delta$.

For out-of-distribution detection, they used the same input perturbation as in ODIN (2018). ODIN used temperature scailing and used the max prob, while this paper does not need temperature scailing since it directly outputs $c$. In evaluation, this paper outperformed ODIN.




## Reference
ODIN: [Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaro)

arxiv.org
arxiv-vanity.com
scholar.google.com

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
Shiyu Liang and Yixuan Li and R. Srikant
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG, stat.ML
more

[link] Summary by elbaro 5 years ago

## Task
Add '**rejection**' output to an existing classification model with softmax layer.

## Method
1. Choose some threshold $\delta$ and temperature $T$
2. Add a perturbation to the input x (eq 2),
let $\tilde x = x - \epsilon \text{sign}(-\nabla_x \log S_{\hat y}(x;T))$
3. If $p(\tilde x;T)\le \delta$, rejects
4. If not, return the output of the original classifier

$p(\tilde x;T)$ is the max prob with temperature scailing for input $\tilde x$

$\delta$ and $T$ are manually chosen.

arxiv.org
arxiv-vanity.com
scholar.google.com

On Calibration of Modern Neural Networks
Chuan Guo and Geoff Pleiss and Yu Sun and Kilian Q. Weinberger
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG
more

[link] Summary by elbaro 5 years ago

## Task

A neural network for classification typically has a **softmax** layer and outputs the class with the max probability. However, this probability does not represent the **confidence**. If the average confidence (average of max probs) for a dataset matches the accuracy, it is called **well-calibrated**. Old models like LeNet (1998) was well-calibrated, but modern networks like ResNet (2016) are no longer well-calibrated. This paper explains what caused this and compares various calibration methods.

## Figure - Confidence Histogram

https://i.imgur.com/dMtdWsL.png

The bottom row: group the samples by confidence (max probailities) into bins, and calculates the accuracy (# correct / # bin size) within each bin.

- ECE (Expected Calibration Error): average of |accuracy-confidence| of bins
- MCE (Maximum Calibration Error): max of |accuracy-confidence| of bins

## Analysis - What
The paper experiments how models are mis-calibrated with different factors: (1) model capacity, (2) batch norm, (3) weight decay, (4) NLL.

## Solution - Calibration Methods

Many calibration methods for binary classification and multi-class classification are evaluated. The method that performed the best is **temperature scailing**, which simply multiplies logits before the softmax by some constant. The paper used the validation set to choose the best constant.

arxiv.org
arxiv-vanity.com
scholar.google.com

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.CV
more

[link] Summary by elbaro 6 years ago

This paper solves two tasks: Image Captioning and VQA.
The main idea is to use Faster R-CNN to embed images (kx2048 from k bounding boxes) instead of ResNet (14x14x2048) and apply attention over k vectors.

For **VQA**, this is basically (Faster R-CNN + ShowAttendAskAnswer). SAAA(ShowAskAttendAnswer) calculates a 2D attention map from the concatenation of a text vector (2048-dim from LSTM) and image tensor (2048x14x14 from ResNet). This image feature can be thought as a collection of 2048-dim feature vectors. This paper uses Faster R-CNN to get k bounding boxes. Each bounding box is a 2048-dim vector so we have kx2048, which is fed to SAAA.


**SAAA**:
https://i.imgur.com/2FnPXi0.png

**This paper (VQA)**:
https://i.imgur.com/xib77Iy.png

For **Image Captioning**, it uses 2-layer LSTM. The first layer gets the average of k 2048-dim vectors. The output is used to calculate the attention weights over k vectors. The second layer gets the weight-averaged 2048-dim vector and the output of the first layer.

https://i.imgur.com/GeXaC30.png

elbaro

sciscore: 2.5