An Experimental and Theoretical Comparison of Model SelectionMethods

  • Authors:
  • Michael Kearns;Yishay Mansour;Andrew Y. Ng;Dana Ron

  • Affiliations:
  • AT&/T Laboratories Research, Murray Hill, NJ/ E-mail: mkearns@research.att.com;Department of Computer Science, Tel Aviv University, Tel Aviv, Israel/ E-mail: mansour@math.tau.ac.il;Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA/ E-mail: Andrew.Ng@cs.cmu.edu;Laboratory of Computer Science, MIT, Cambridge, MA/ E-mail: danar@theory.lcs.mit.edu

  • Venue:
  • Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

We investigate the problem of {\it model\ selection}in the setting of supervised learning of boolean functions from independentrandom examples. More precisely, we compare methods for finding a balancebetween the complexity of the hypothesis chosen and its observed error ona random training sample of limited size, when the goal isthat of minimizing the resulting generalization error. We undertake adetailed comparison of three well-known model selectionmethods — a variation of Vapnik‘s {\it Guaranteed\ Risk\ Minimization} (GRM), an instance of Rissanen‘s {\itMinimum\ Description\ Length\ Principle} (MDL), and (hold-out) crossvalidation (CV). We introduce a general class of model selectionmethods (called {\it penalty-based} methods) thatincludes both GRM and MDL, and provide general methods for analyzing such rules.We provide both controlled experimental evidence and formal theoremsto support the following conclusions:\bulletEven on simple model selection problems, the behavior of the methodsexamined can be both complex and incomparable. Furthermore, no amountof “tuning” of the rules investigated (such as introducing constantmultipliers on the complexity penalty terms, or adistribution-specific “effective dimension”) can eliminate thisincomparability.\bulletIt is possible to give rather general bounds on the generalizationerror, as a function of sample size, for penalty-based methods. Thequality of such bounds depends in a precise way on the extent towhich the method considered automatically limits the complexity ofthe hypothesis selected.\bulletFor {\it any} model selection problem, the additional error ofcross validation compared to {\it any} other methodcan be bounded above by the sum of two terms. The first term is large only if thelearning curve of the underlying function classes experiences aphase transition” between (1-\gamma)m andm examples (where \gamma is the fraction saved for testing in CV). The second andcompeting term can be made arbitrarily small by increasing\gamma.\bulletThe class of penalty-based methods is fundamentally handicapped inthe sense that there exist two types of model selection problems forwhich every penalty-based method must incur large generalizationerror on at least one, while CV enjoys small generalization error onboth.