Pattern recognition: statistical, structural and neural approaches
Pattern recognition: statistical, structural and neural approaches
C4.5: programs for machine learning
C4.5: programs for machine learning
Floating search methods in feature selection
Pattern Recognition Letters
Adaptive floating search methods in feature selection
Pattern Recognition Letters - Special issue on pattern recognition in practice VI
Multiple Comparisons in Induction Algorithms
Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
Overfitting in making comparisons between variable selection methods
The Journal of Machine Learning Research
A Direct Method of Nonparametric Measurement Selection
IEEE Transactions on Computers
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
In feature selection, classification accuracy typically needs to be estimated in order to guide the search towards the useful subsets. It has earlier been shown [1] that such estimates should not be used directly to determine the optimal subset size, or the benefits due to choosing the optimal set. The reason is a phenomenon called overfitting, thanks to which these estimates tend to be biased. Previously, an outer loop of cross-validation has been suggested for fighting this problem. However, this paper points out that a straightforward implementation of such an approach still gives biased estimates for the increase in accuracy that could be obtained by selecting the best-performing subset. In addition, two methods are suggested that are able to circumvent this problem and give virtually unbiased results without adding almost any computational overhead.