Summaries from International Conference on Machine Learning on ShortScience.org

www.aaai.org
sci-hub
scholar.google.com

Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic
Yan, Lian and Dodier, Robert H. and Mozer, Michael and Wolniewicz, Richard H.
International Conference on Machine Learning - 2003 via Local Bibsonomy
Keywords: dblp

[link] Summary by Prateek Gupta 4 years ago

In binary classification task on an imbalanced dataset, we often report *area under the curve* (AUC) of *receiver operating characteristic* (ROC) as the classifier's ability to distinguish two classes.
If there are $k$ errors, accuracy will be the same irrespective of how those $k$ errors are made i.e. misclassification of positive samples or misclassification of negative samples. 
AUC-ROC is a metric that treats these misclassifications asymmetrically, making it an appropriate statistic for classification tasks on imbalanced datasets. 

However, until this paper, AUC-ROC was hard to quantify and differentiate to gradient-descent over. 
This paper approximated AUC-ROC by a Wilcoxon-Mann-Whitney statistic which counts the "number of wins" in all the pairwise comparisons -
$
U = \frac{\sum_{i=1}^{m}\sum_{j=1}^{n}I(x_i, x_j)}{mn},
$
where $m$ is the total number of positive samples, $n$ is the number of negative samples, and $I(x_i, x_j)$ is $1$ if $x_i$ is ranked higher than $x_j$. 
Figure 1 in the paper shows the variance of this statistic with an increasing imbalance in the dataset, justifying the close correspondence with AUC-ROC.

Further, to make this metric smooth and differentiable, the step function of pairwise comparison is replaced by sigmoid or hinge functions. 
Further extensions are made to apply this to multi-class classification tasks and focus on top-K predictions i.e. optimize lower-left part of AUC.