Model Selection for Small Sample Regression
Machine Learning
Metric-Based Methods for Adaptive Model Selection and Regularization
Machine Learning
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)
A new metric-based approach to model selection
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
An introduction to variable and feature selection
The Journal of Machine Learning Research
Feature subset selection for learning preferences: a case study
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
The Journal of Machine Learning Research
Generalization error bounds using unlabeled data
COLT'05 Proceedings of the 18th annual conference on Learning Theory
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Feature selection for dimensionality reduction
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
An alternative approach to avoid overfitting for surrogate models
Proceedings of the Winter Simulation Conference
Towards objective measures of algorithm performance across instance space
Computers and Operations Research
Hi-index | 0.00 |
Metric-based methods have recently been introduced for model selection and regularization, often yielding very significant improvements over the alternatives tried (including cross-validation). All these methods require unlabeled data over which to compare functions and detect gross differences in behavior away from the training points. We introduce three new extensions of the metric model selection methods and apply them to feature selection. The first extension takes advantage of the particular case of time-series data in which the task involves prediction with a horizon h. The idea is to use at t the h unlabeled examples that precede t for model selection. The second extension takes advantage of the different error distributions of cross-validation and the metric methods: cross-validation tends to have a larger variance and is unbiased. A hybrid combining the two model selection methods is rarely beaten by any of the two methods. The third extension deals with the case when unlabeled data is not available at all, using an estimated input density. Experiments are described to study these extensions in the context of capacity control and feature subset selection.