An Experimental and Theoretical Comparison of Model SelectionMethods

Authors:
Michael Kearns;Yishay Mansour;Andrew Y. Ng;Dana Ron
Affiliations:
AT&/T Laboratories Research, Murray Hill, NJ/ E-mail: mkearns@research.att.com;Department of Computer Science, Tel Aviv University, Tel Aviv, Israel/ E-mail: mansour@math.tau.ac.il;Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA/ E-mail: Andrew.Ng@cs.cmu.edu;Laboratory of Computer Science, MIT, Cambridge, MA/ E-mail: danar@theory.lcs.mit.edu
Venue:
Machine Learning - Special issue on the eighth annual conference on computational learning theory, (COLT '95)
Year:
1997

Citing 9
Cited 25

Computational limitations on learning from examples

Journal of the ACM (JACM)
Inferring decision trees using the minimum description length principle

Information and Computation
Training a 3-node neural network in NP-complete

Advances in neural information processing systems 1
Elements of information theory

Elements of information theory
Toward efficient agnostic learning

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Rigorous learning curve bounds from statistical mechanics

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split

Neural Computation
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimating the expected error of empirical minimizers for model selection

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Viewing all models as “probabilistic”

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
MML clustering of multi-state, Poisson, vonMises circular and Gaussian distributions

Statistics and Computing
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Nonparametric Regularization of Decision Trees

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Finding Cutpoints in Noisy Binary Sequences - A Revised Empirical Evaluation

AI '99 Proceedings of the 12th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Repeated Measures Multiple Comparison Procedures Applied to Model Selection in Neural Networks

IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Bio-inspired Applications of Connectionism-Part II
Change-Point Estimation Using New Minimum Message Length Approximations

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Some Elements of Machine Learning (Extended Abstract)

ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Further Explanation of the Effectiveness of Voting Methods: The Game between Margins and Weights

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Boosting and Microarray Data

Machine Learning
Learning kernel-based HMMs for dynamic sequence synthesis

Graphical Models - Special issue on Pacific graphics 2002
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Maximum-Scoring Segment Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Asymptotics in Empirical Risk Minimization

The Journal of Machine Learning Research
Suboptimal behavior of Bayes and MDL in classification under misspecification

Machine Learning
Estimation of the conditional risk in classification: The swapping method

Computational Statistics & Data Analysis
An Information Criterion for Variable Selection in Support Vector Machines

The Journal of Machine Learning Research
Proper Model Selection with Significance Test

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Discriminative model selection for belief net structures

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Robustness and Regularization of Support Vector Machines

The Journal of Machine Learning Research
Subspace analysis and optimization for AAM based face alignment

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research
Segmentation of the mean of heteroscedastic data via cross-validation

Statistics and Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We investigate the problem of {\it model\ selection}in the setting of supervised learning of boolean functions from independentrandom examples. More precisely, we compare methods for finding a balancebetween the complexity of the hypothesis chosen and its observed error ona random training sample of limited size, when the goal isthat of minimizing the resulting generalization error. We undertake adetailed comparison of three well-known model selectionmethods — a variation of Vapnik‘s {\it Guaranteed\ Risk\ Minimization} (GRM), an instance of Rissanen‘s {\itMinimum\ Description\ Length\ Principle} (MDL), and (hold-out) crossvalidation (CV). We introduce a general class of model selectionmethods (called {\it penalty-based} methods) thatincludes both GRM and MDL, and provide general methods for analyzing such rules.We provide both controlled experimental evidence and formal theoremsto support the following conclusions:\bulletEven on simple model selection problems, the behavior of the methodsexamined can be both complex and incomparable. Furthermore, no amountof “tuning” of the rules investigated (such as introducing constantmultipliers on the complexity penalty terms, or adistribution-specific “effective dimension”) can eliminate thisincomparability.\bulletIt is possible to give rather general bounds on the generalizationerror, as a function of sample size, for penalty-based methods. Thequality of such bounds depends in a precise way on the extent towhich the method considered automatically limits the complexity ofthe hypothesis selected.\bulletFor {\it any} model selection problem, the additional error ofcross validation compared to {\it any} other methodcan be bounded above by the sum of two terms. The first term is large only if thelearning curve of the underlying function classes experiences aphase transition” between (1-\gamma)m andm examples (where \gamma is the fraction saved for testing in CV). The second andcompeting term can be made arbitrarily small by increasing\gamma.\bulletThe class of penalty-based methods is fundamentally handicapped inthe sense that there exist two types of model selection problems forwhich every penalty-based method must incur large generalizationerror on at least one, while CV enjoys small generalization error onboth.