Model Selection and Error Estimation

Authors:
Peter L. Bartlett;Stéphane Boucheron;Gábor Lugosi
Affiliations:
BIOwulf Technologies, 2030 Addison Street, Suite 102, Berkeley, CA 94704, USA. Peter.Bartlett@anu.edu.au;Laboratoire de Recherche en Informatique, Bâtiment 490, CNRS-Université Paris-Sud, 91405 Orsay-Cedex, France. bouchero@lri.fr;Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005 Barcelona, Spain. lugosi@upf.es
Venue:
Machine Learning
Year:
2002

Citing 19
Cited 42

Nonlinear statistical models

Nonlinear statistical models
Measuring the VC-dimension of a learning machine

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
An experimental and theoretical comparison of model selection methods

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Performance bounds for nonlinear time series prediction

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Self bounding learning algorithms

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Generalization performance of support vector machines and other pattern classifiers

Advances in kernel methods
LEDA: a platform for combinatorial and geometric computing

LEDA: a platform for combinatorial and geometric computing
Some comments on CP

Technometrics
Improved Generalization Through Explicit Optimization of Margins

Machine Learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
A sharp concentration inequality with application

Random Structures & Algorithms
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
Concept learning using complexity regularization

IEEE Transactions on Information Theory
Minimum complexity regression estimation with weakly dependent observations

IEEE Transactions on Information Theory - Part 2
An asymptotic property of model selection criteria

IEEE Transactions on Information Theory
The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network

IEEE Transactions on Information Theory
Nonparametric estimation via empirical risk minimization

IEEE Transactions on Information Theory
Radial basis function networks and complexity regularization in function learning

IEEE Transactions on Neural Networks

Adaptive Model Selection for Digital Linear Classifiers

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Automatic Hyperparameter Tuning for Support Vector Machines

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Rademacher and Gaussian Complexities: Risk Bounds and Structural Results

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Data-Dependent Margin-Based Generalization Bounds for Classification

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Localized Rademacher Complexities

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
An introduction to boosting and leveraging

Advanced lectures on machine learning
On learning multicategory classification with sample queries

Information and Computation
Quantum optimization for training support vector machines

Neural Networks - 2003 Special issue: Advances in neural networks research — IJCNN'03
Data-dependent margin-based generalization bounds for classification

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Greedy algorithms for classification—consistency, convergence rates, and adaptivity

The Journal of Machine Learning Research
Generalization error bounds for Bayesian mixture algorithms

The Journal of Machine Learning Research
Asymptotics in Empirical Risk Minimization

The Journal of Machine Learning Research
Complexity of hyperconcepts

Theoretical Computer Science - Computing and combinatorics
Model selection by bootstrap penalization for classification

Machine Learning
Learning Minimum Volume Sets

The Journal of Machine Learning Research
Estimation of the conditional risk in classification: The swapping method

Computational Statistics & Data Analysis
Co-occurrence Matrixes for the Quality Assessment of Coded Images

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Learning from Multiple Sources

The Journal of Machine Learning Research
Generalization bounds for weighted binary classification with applications to statistical verification

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Hybrid Neural Systems for Reduced-Reference Image Quality Assessment

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Maximal-discrepancy bounds for regularized classifiers

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Model selection with the Loss Rank Principle

Computational Statistics & Data Analysis
Using unsupervised analysis to constrain generalization bounds for support vector classifiers

IEEE Transactions on Neural Networks
Quantization and clustering with Bregman divergences

Journal of Multivariate Analysis
Mixing linear SVMs for nonlinear classification

IEEE Transactions on Neural Networks
Rademacher Complexities and Bounding the Excess Risk in Active Learning

The Journal of Machine Learning Research
PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research
Maximal Discrepancy for Support Vector Machines

Neurocomputing
A novel multi-view classifier based on Nyström approximation

Expert Systems with Applications: An International Journal
Scalable image quality assessment with 2D mel-cepstrum and machine learning approach

Pattern Recognition
Data mining tools: from web to grid architectures

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Resampling methods for meta-model validation with recommendations for evolutionary computation

Evolutionary Computation
Regularized multi-view learning machine based on response surface technique

Neurocomputing
Circular-ELM for the reduced-reference assessment of perceived image quality

Neurocomputing
Rademacher complexity and structural risk minimization: an application to human gene expression datasets

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
An alternative approach to avoid overfitting for surrogate models

Proceedings of the Winter Simulation Conference
Approximation and estimation bounds for free knot splines

Computers & Mathematics with Applications
Eigenvalues perturbation of integral operator for kernel selection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Universal learning using free multivariate splines

Neurocomputing
Generalization ability of fractional polynomial models

Neural Networks
Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.