The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms

Authors:
Xin Zhou;K. Z. Mao
Affiliations:
School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue Singapore 639798, Singapore
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 5

Decorrelation of the true and estimated classifier errors in high-dimensional settings

EURASIP Journal on Bioinformatics and Systems Biology
The peaking phenomenon in the presence of feature-selection

Pattern Recognition Letters
A Probabilistic mechanism based on clustering analysis and distance measure for subset gene selection

Expert Systems with Applications: An International Journal
Recursive Mahalanobis Separability Measure for Gene Subset Selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Robust Feature Selection for Microarray Data Based on Multicriterion Fusion

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem. Results: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised |w|2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators. Availability: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/. The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments. Contact: ekzmao@ntu.edu.sg