[link]
The authors consider online learning of binary values, where each period each of $N$ experts makes a prediction (in $[0,1]$), and the learner must make predictions such that, in hindsight, the learner didn't do much worse than the best expert. The loss function the authors use is $\hat{Y}_t  Y_t$, the the total loss is the sum over all $t$. The authors first solution to this problem is an algorithm called MM. It uses a complex recursivelydefined function v which takes exponential time to compute, but which (roughly) computes the anticipated future total loss of each expert for the rest of the game and uses that. The bound they give for MM is in terms of this $v$ function, so it isn't easily interpreted. This is analyzed in tremendous detail and under various assumptions. The authors then give a more familiar (and implementable) multiplicative weight update algorithm, where a certain nonnegative weight is placed on each expert and our prediction at any time is the weighted average of the experts' predictions. After every period, each expert's weight is multiplied by some function of the loss it incurred that period and a learning rate. They show how, when the weight updates are done using the right function, the algorithm has a nice Bayesian interpretation. This paper is dense (at 59 pages) and so filled with proofs it feels like reading an appendix.
Your comment:
