Improved Generalization Through Explicit Optimization of Margins

Authors:
Llew Mason;Peter L. Bartlett;Jonathan Baxter
Affiliations:
Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia;Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia;Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia
Venue:
Machine Learning
Year:
2000

Citing 3
Cited 27

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Boosting in the limit: maximizing the margin of learned ensembles

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Transactions on Information Theory

Soft Margins for AdaBoost

Machine Learning
On the VC Dimension of Bounded Margin Classifiers

Machine Learning
Model Selection and Error Estimation

Machine Learning
Ensembles of Learning Machines

WIRN VIETRI 2002 Proceedings of the 13th Italian Workshop on Neural Nets-Revised Papers
Limitations of Learning via Embeddings in Euclidean Half-Spaces

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Minimum majority classification and boosting

Eighteenth national conference on Artificial intelligence
Boosting and Microarray Data

Machine Learning
Limitations of learning via embeddings in euclidean half spaces

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods

The Journal of Machine Learning Research
The Synergy Between PAV and AdaBoost

Machine Learning
Trading convexity for scalability

ICML '06 Proceedings of the 23rd international conference on Machine learning
Rotation Forest: A New Classifier Ensemble Method

IEEE Transactions on Pattern Analysis and Machine Intelligence
An analysis of diversity measures

Machine Learning
Large Scale Transductive SVMs

The Journal of Machine Learning Research
Nonlinear Boosting Projections for Ensemble Construction

The Journal of Machine Learning Research
Large-margin minimum classification error training: A theoretical risk minimization perspective

Computer Speech and Language
Letters: Training robust support vector machine with smooth Ramp loss in the primal space

Neurocomputing
A Kernel Method for the Optimization of the Margin Distribution

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Supervised projection approach for boosting classifiers

Pattern Recognition
Kernel-matching pursuits with arbitrary loss functions

IEEE Transactions on Neural Networks
Online adaptive policies for ensemble classifiers

Neurocomputing
Edited AdaBoost by weighted kNN

Neurocomputing
Martingale boosting

COLT'05 Proceedings of the 18th annual conference on Learning Theory
A MapReduce-based distributed SVM ensemble for scalable image classification and annotation

Computers & Mathematics with Applications
GA-Ensemble: a genetic algorithm for robust ensembles

Computational Statistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent theoretical results have shown that the generalizationperformance of thresholded convex combinations of base classifiers isgreatly improved if the underlying convex combination has largemargins on the training data (i.e., correct examples are classifiedwell away from the decision boundary). Neural network algorithms andAdaBoost have been shown to implicitly maximize margins, thusproviding some theoretical justification for their remarkably goodgeneralization performance. In this paper we are concerned withmaximizing the margin explicitly. In particular, we prove a theorembounding the generalization performance of convex combinations interms of general cost functions of the margin, in contrast to previousresults, which were stated in terms of the particular cost functionsgn(&thetas; − margin). We then present a new algorithm,DOOM, for directly optimizing a piecewise-linear family of costfunctions satisfying the conditions of the theorem. Experiments onseveral of the datasets in the UC Irvine database are presented inwhich AdaBoost was used to generate a set of base classifiers and thenDOOM was used to find the optimal convex combination of thoseclassifiers. In all but one case the convex combination generated byDOOM had lower test error than AdaBoost's combination. In many casesDOOM achieves these lower test errors by sacrificing training error,in the interests of reducing the new cost function. In our experimentsthe margin plots suggest that the size of the minimum margin is notthe critical factor in determining generalization performance.