Mixability is bayes risk curvature relative to log loss

Authors:
Tim Van Erven;Mark D. Reid;Robert C. Williamson
Affiliations:
Département de Mathématiques, Université Paris-Sud, Orsay Cedex, France;Research School of Computer Science, The Australian National University, Canberra, ACT, Australia and National ICT Australia;Research School of Computer Science, The Australian National University, Canberra, ACT, Australia and National ICT Australia
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 11
Cited 0

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
A game of prediction with expert advice

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Loss functions, complexities, and the legendre transformation

Theoretical Computer Science - Special issue: Algorithmic learning theory
Prediction, Learning, and Games

Prediction, Learning, and Games
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
The weak aggregating algorithm and weak mixability

Journal of Computer and System Sciences
Prediction With Expert Advice For The Brier Game

The Journal of Machine Learning Research
Supermartingales in prediction with expert advice

Theoretical Computer Science
Composite Binary Losses

The Journal of Machine Learning Research
Information, Divergence and Risk for Binary Experiments

The Journal of Machine Learning Research
Sequential prediction of individual sequences under general loss functions

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mixability of a loss characterizes fast rates in the online learning setting of prediction with expert advice. The determination of the mixability constant for binary losses is straight forward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multiclass proper losses where there are few existing results. We show that mixability is governed by the maximum eigenvalue of the Hessian of the Bayes risk, relative to the Hessian of the Bayes risk for log loss. We conclude by comparing our result to other work that bounds prediction performance in terms of the geometry of the Bayes risk. Although all calculations are for proper losses, we also show how to carry the results across to improper losses.