How to use expert adviceHow to use expert adviceCesa-Bianchi, Nicolò and Freund, Yoav and Helmbold, David P. and Haussler, David and Schapire, Robert E. and Warmuth, Manfred K.1993

Paper summarycdmurray80The authors consider online learning of binary values, where each period each of $N$ experts makes a prediction (in $[0,1]$), and the learner must make predictions such that, in hindsight, the learner didn't do much worse than the best expert. The loss function the authors use is $|\hat{Y}_t - Y_t|$, the the total loss is the sum over all $t$.
The authors first solution to this problem is an algorithm called MM. It uses a complex recursively-defined function v which takes exponential time to compute, but which (roughly) computes the anticipated future total loss of each expert for the rest of the game and uses that. The bound they give for MM is in terms of this $v$ function, so it isn't easily interpreted. This is analyzed in tremendous detail and under various assumptions.
The authors then give a more familiar (and implementable) multiplicative weight update algorithm, where a certain non-negative weight is placed on each expert and our prediction at any time is the weighted average of the experts' predictions. After every period, each expert's weight is multiplied by some function of the loss it incurred that period and a learning rate. They show how, when the weight updates are done using the right function, the algorithm has a nice Bayesian interpretation. This paper is dense (at 59 pages) and so filled with proofs it feels like reading an appendix.

The authors consider online learning of binary values, where each period each of $N$ experts makes a prediction (in $[0,1]$), and the learner must make predictions such that, in hindsight, the learner didn't do much worse than the best expert. The loss function the authors use is $|\hat{Y}_t - Y_t|$, the the total loss is the sum over all $t$.
The authors first solution to this problem is an algorithm called MM. It uses a complex recursively-defined function v which takes exponential time to compute, but which (roughly) computes the anticipated future total loss of each expert for the rest of the game and uses that. The bound they give for MM is in terms of this $v$ function, so it isn't easily interpreted. This is analyzed in tremendous detail and under various assumptions.
The authors then give a more familiar (and implementable) multiplicative weight update algorithm, where a certain non-negative weight is placed on each expert and our prediction at any time is the weighted average of the experts' predictions. After every period, each expert's weight is multiplied by some function of the loss it incurred that period and a learning rate. They show how, when the weight updates are done using the right function, the algorithm has a nice Bayesian interpretation. This paper is dense (at 59 pages) and so filled with proofs it feels like reading an appendix.