Towards robust model selection using estimation and approximation error bounds
COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Performance bounds for nonlinear time series prediction
COLT '97 Proceedings of the tenth annual conference on Computational learning theory
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Prequential and Cross-Validated Regression Estimation
Machine Learning
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Nonparametric Time Series Prediction Through Adaptive ModelSelection
Machine Learning
Model Selection and Error Estimation
Machine Learning
PAC-Bayesian Stochastic Model Selection
Machine Learning
On learning multicategory classification with sample queries
Information and Computation
Nonparametric Supervised Learning by Linear Interpolation with Maximum Entropy
IEEE Transactions on Pattern Analysis and Machine Intelligence
A penalized criterion for variable selection in classification
Journal of Multivariate Analysis
Model selection by bootstrap penalization for classification
Machine Learning
The Journal of Machine Learning Research
Estimation of the conditional risk in classification: The swapping method
Computational Statistics & Data Analysis
Joint universal lossy coding and identification of stationary mixing sources with general alphabets
IEEE Transactions on Information Theory
Hi-index | 754.90 |
In pattern recognition or, as it has also been called, concept learning, the value of a { 0,1}-valued random variable Y is to be predicted based upon observing an Rd-valued random variable X. We apply the method of complexity regularization to learn concepts from large concept classes. The method is shown to automatically find a good balance between the approximation error and the estimation error. In particular, the error probability of the obtained classifier is shown to decrease as O(√(logn/n)) to the achievable optimum, for large nonparametric classes of distributions, as the sample size n grows. We also show that if the Bayes error probability is zero and the Bayes rule is in a known family of decision rules, the error probability is O(logn/n) for many large families, possibly with infinite VC dimension