Suboptimality of penalized empirical risk minimization in classification

Authors:
Guillaume Lecué
Affiliations:
Laboratoire de Probabilités et Modèles Aléatoires, UMR, CNRS, Université Paris VI, Paris, France
Venue:
COLT'07 Proceedings of the 20th annual conference on Learning theory
Year:
2007

Citing 8
Cited 0

Aggregating strategies

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Support-Vector Networks

Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Problems of Information Transmission
Prediction, Learning, and Games

Prediction, Learning, and Games
Optimal oracle inequality for aggregation of classifiers under low noise condition

COLT'06 Proceedings of the 19th annual conference on Learning Theory
A randomized online learning algorithm for better variance control

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Information Theory and Mixing Least-Squares Regressions

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let F be a set of M classification procedures with values in [-1, 1]. Given a loss function, we want to construct a procedure which mimics at the best possible rate the best procedure in F. This fastest rate is called optimal rate of aggregation. Considering a continuous scale of loss functions with various types of convexity, we prove that optimal rates of aggregation can be either ((logM)/n)1/2 or (logM)/n. We prove that, if all the M classifiers are binary, the (penalized) Empirical Risk Minimization procedures are suboptimal (even under the margin/low noise condition) when the loss function is somewhat more than convex, whereas, in that case, aggregation procedures with exponential weights achieve the optimal rate of aggregation.