Regret to the Best vs. Regret to the AverageRegret to the Best vs. Regret to the AverageEven-Dar, Eyal and Kearns, Michael and Mansour, Yishay and Wortman, Jennifer2007
Paper summarycdmurray80Without a doubt this is one of the better papers I've read on this topic: it's well written and their proofs (except the last technical one) are relatively straightforward. This also cites other useful papers on expert learning (especially the first two references).
The authors work in the experts allocation framework, where each period each of N experts has a gain in [0,1], and the algorithm must choose a weighting w on the probability simplex and will receive gain w \cdot g, where g is a vector of the experts' gains. The algorithms total gain is the sum of its gains over T time steps. The authors define regret to the best (average, worst) as the amount by which the algorithm's total gain lags that of the best (average, worst) expert. They point out that, while traditional expert weighting algorithm have regret $O(T^{0.5})$ to the best, they also have regret $\Omega(T^{0.5})$ to the average and worst expert. They show that any algorithm with $O(T^0.5)$ regret to best has $\Omega(T^{0.5})$ regret to worst (more detailed bounds are provided) They develop an algorithm with $O((N\cdot T)^{0.5} \cdot log(T))$ regret to best and constant regret to average: thus giving up a little regret to the best expert for much better regret to the average. The first of these, algorithms, BestAverage, basically runs an exponential weights algorithm, resetting whenever the best expert diverges sufficiently from the average.
Without a doubt this is one of the better papers I've read on this topic: it's well written and their proofs (except the last technical one) are relatively straightforward. This also cites other useful papers on expert learning (especially the first two references).
The authors work in the experts allocation framework, where each period each of N experts has a gain in [0,1], and the algorithm must choose a weighting w on the probability simplex and will receive gain w \cdot g, where g is a vector of the experts' gains. The algorithms total gain is the sum of its gains over T time steps. The authors define regret to the best (average, worst) as the amount by which the algorithm's total gain lags that of the best (average, worst) expert. They point out that, while traditional expert weighting algorithm have regret $O(T^{0.5})$ to the best, they also have regret $\Omega(T^{0.5})$ to the average and worst expert. They show that any algorithm with $O(T^0.5)$ regret to best has $\Omega(T^{0.5})$ regret to worst (more detailed bounds are provided) They develop an algorithm with $O((N\cdot T)^{0.5} \cdot log(T))$ regret to best and constant regret to average: thus giving up a little regret to the best expert for much better regret to the average. The first of these, algorithms, BestAverage, basically runs an exponential weights algorithm, resetting whenever the best expert diverges sufficiently from the average.