Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners
IEEE Transactions on Pattern Analysis and Machine Intelligence
A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
Floating search methods in feature selection
Pattern Recognition Letters
IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature subset selection bias for classification learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Impact of error estimation on feature selection
Pattern Recognition
A Problem of Dimensionality: A Simple Example
IEEE Transactions on Pattern Analysis and Machine Intelligence
Is feature selection still necessary?
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
On the mean accuracy of statistical pattern recognizers
IEEE Transactions on Information Theory
EURASIP Journal on Bioinformatics and Systems Biology
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The effect of sampling rate on the performance of template-based gesture recognizers
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Disease Liability Prediction from Large Scale Genotyping Data Using Classifiers with a Reject Option
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
The impact of motion dimensionality and bit cardinality on the design of 3D gesture recognizers
International Journal of Human-Computer Studies
Hi-index | 0.10 |
For a fixed sample size, a common phenomenon is that the error of a designed classifier decreases and then increases as the number of features grows. This peaking phenomenon has been recognized for forty years and depends on the classification rule and feature-label distribution. Historically, the peaking phenomenon has been treated by assuming a fixed ordering of the features, usually beginning with the strongest individual feature and proceeding with features of decreasing individual classification capability. This does not take into account feature-selection, which is commonplace in high-dimensional and small sample settings. This paper revisits the peaking phenomenon in the presence of feature-selection. Using massive simulation in a high-performance computing environment, the paper considers various combinations of feature-label models, feature-selection algorithms, and classifier models to produce a large library of error versus feature size curves. Owing to the prevalence of feature-selection in genomic classification, we also consider gene-expression-based classification of breast-cancer patient prognosis. Results vary widely and are strongly dependent on the combination. The error curves tend to fall into three categories: peaking, settling into a plateau, or falling very slowly over a long range of feature set sizes. It can be concluded that one should be wary of applying peaking results found in the absence of feature-selection to settings in which feature-selection is employed.