Computational limitations on learning from examples
Journal of the ACM (JACM)
Inferring decision trees using the minimum description length principle
Information and Computation
Training a 3-node neural network in NP-complete
Advances in neural information processing systems 1
Elements of information theory
Elements of information theory
Toward efficient agnostic learning
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Rigorous learning curve bounds from statistical mechanics
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Estimating the expected error of empirical minimizers for model selection
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Viewing all models as “probabilistic”
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
MML clustering of multi-state, Poisson, vonMises circular and Gaussian distributions
Statistics and Computing
Nonparametric Regularization of Decision Trees
ECML '00 Proceedings of the 11th European Conference on Machine Learning
Finding Cutpoints in Noisy Binary Sequences - A Revised Empirical Evaluation
AI '99 Proceedings of the 12th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Repeated Measures Multiple Comparison Procedures Applied to Model Selection in Neural Networks
IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Bio-inspired Applications of Connectionism-Part II
Change-Point Estimation Using New Minimum Message Length Approximations
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Some Elements of Machine Learning (Extended Abstract)
ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Further Explanation of the Effectiveness of Voting Methods: The Game between Margins and Weights
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Machine Learning
Learning kernel-based HMMs for dynamic sequence synthesis
Graphical Models - Special issue on Pacific graphics 2002
Rademacher and gaussian complexities: risk bounds and structural results
The Journal of Machine Learning Research
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Asymptotics in Empirical Risk Minimization
The Journal of Machine Learning Research
Estimation of the conditional risk in classification: The swapping method
Computational Statistics & Data Analysis
An Information Criterion for Variable Selection in Support Vector Machines
The Journal of Machine Learning Research
Proper Model Selection with Significance Test
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Discriminative model selection for belief net structures
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Robustness and Regularization of Support Vector Machines
The Journal of Machine Learning Research
Subspace analysis and optimization for AAM based face alignment
FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
PAC-Bayesian Analysis of Co-clustering and Beyond
The Journal of Machine Learning Research
Segmentation of the mean of heteroscedastic data via cross-validation
Statistics and Computing
Hi-index | 0.01 |
We investigate the problem of {\it model\ selection}in the setting of supervised learning of boolean functions from independentrandom examples. More precisely, we compare methods for finding a balancebetween the complexity of the hypothesis chosen and its observed error ona random training sample of limited size, when the goal isthat of minimizing the resulting generalization error. We undertake adetailed comparison of three well-known model selectionmethods — a variation of Vapnik‘s {\it Guaranteed\ Risk\ Minimization} (GRM), an instance of Rissanen‘s {\itMinimum\ Description\ Length\ Principle} (MDL), and (hold-out) crossvalidation (CV). We introduce a general class of model selectionmethods (called {\it penalty-based} methods) thatincludes both GRM and MDL, and provide general methods for analyzing such rules.We provide both controlled experimental evidence and formal theoremsto support the following conclusions:\bulletEven on simple model selection problems, the behavior of the methodsexamined can be both complex and incomparable. Furthermore, no amountof “tuning” of the rules investigated (such as introducing constantmultipliers on the complexity penalty terms, or adistribution-specific “effective dimension”) can eliminate thisincomparability.\bulletIt is possible to give rather general bounds on the generalizationerror, as a function of sample size, for penalty-based methods. Thequality of such bounds depends in a precise way on the extent towhich the method considered automatically limits the complexity ofthe hypothesis selected.\bulletFor {\it any} model selection problem, the additional error ofcross validation compared to {\it any} other methodcan be bounded above by the sum of two terms. The first term is large only if thelearning curve of the underlying function classes experiences aphase transition” between (1-\gamma)m andm examples (where \gamma is the fraction saved for testing in CV). The second andcompeting term can be made arbitrarily small by increasing\gamma.\bulletThe class of penalty-based methods is fundamentally handicapped inthe sense that there exist two types of model selection problems forwhich every penalty-based method must incur large generalizationerror on at least one, while CV enjoys small generalization error onboth.