On understanding and assessing feature selection bias

Authors:
Šarunas Raudys;Richard Baumgartner;Ray Somorjai
Affiliations:
Vilnius Gediminas Technical University, Vilnius, Lithuania;Institute for Biodiagnostics, National Research Council Canada, Winnipeg, MB, Canada;Institute for Biodiagnostics, National Research Council Canada, Winnipeg, MB, Canada
Venue:
AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
Year:
2005

Citing 2
Cited 1

Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical and neural classifiers: an integrated approach to design

Statistical and neural classifiers: an integrated approach to design

Liknon Feature Selection for Microarrays

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection in high-dimensional biomedical data, such as gene expression arrays or biomedical spectra constitutes and important step towards biomarker discovery. Controlling feature selection bias is considered a major issue for a realistic assessment of the feature selection process. We propose a theoretical, probabilistic framework for the analysis of selection bias. In particular, we derive the means of calculating the true selection error when the performance estimates of the feature subsets are mutually dependent and the distribution density of the true error is arbitrary. We demonstrate in an extensive series of experiments the utility of the theoretical derivations with real-world datasets. We discuss the importance of understanding feature selection bias for the small sample size (n) / high dimensionality (p) situation, typical for biomedical data (genomics, proteomics, spectroscopy).