Improved Generalization Through Explicit Optimization of Margins

  • Authors:
  • Llew Mason;Peter L. Bartlett;Jonathan Baxter

  • Affiliations:
  • Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia;Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia;Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT 0200, Australia

  • Venue:
  • Machine Learning
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent theoretical results have shown that the generalizationperformance of thresholded convex combinations of base classifiers isgreatly improved if the underlying convex combination has largemargins on the training data (i.e., correct examples are classifiedwell away from the decision boundary). Neural network algorithms andAdaBoost have been shown to implicitly maximize margins, thusproviding some theoretical justification for their remarkably goodgeneralization performance. In this paper we are concerned withmaximizing the margin explicitly. In particular, we prove a theorembounding the generalization performance of convex combinations interms of general cost functions of the margin, in contrast to previousresults, which were stated in terms of the particular cost functionsgn(&thetas; − margin). We then present a new algorithm,DOOM, for directly optimizing a piecewise-linear family of costfunctions satisfying the conditions of the theorem. Experiments onseveral of the datasets in the UC Irvine database are presented inwhich AdaBoost was used to generate a set of base classifiers and thenDOOM was used to find the optimal convex combination of thoseclassifiers. In all but one case the convex combination generated byDOOM had lower test error than AdaBoost's combination. In many casesDOOM achieves these lower test errors by sacrificing training error,in the interests of reducing the new cost function. In our experimentsthe margin plots suggest that the size of the minimum margin is notthe critical factor in determining generalization performance.