The ties problem resulting from counting-based error estimators and its impact on gene selection algorithms

  • Authors:
  • Xin Zhou;K. Z. Mao

  • Affiliations:
  • School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue Singapore 639798, Singapore

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Feature selection approaches, such as filter and wrapper, have been applied to address the gene selection problem in the literature of microarray data analysis. In wrapper methods, the classification error is usually used as the evaluation criterion of feature subsets. Due to the nature of high dimensionality and small sample size of microarray data, however, counting-based error estimation may not necessarily be an ideal criterion for gene selection problem. Results: Our study reveals that evaluating genes in terms of counting-based error estimators such as resubstitution error, leave-one-out error, cross-validation error and bootstrap error may encounter severe ties problem, i.e. two or more gene subsets score equally, and this in turn results in uncertainty in gene selection. Our analysis finds that the ties problem is caused by the discrete nature of counting-based error estimators and could be avoided by using continuous evaluation criteria instead. Experiment results show that continuous evaluation criteria such as generalised |w|2 measure for support vector machines and modified Relief's measure for k-nearest neighbors produce improved gene selection compared with counting-based error estimators. Availability: The companion website is at http://www.ntu.edu.sg/home5/pg02776030/wrappers/. The website contains (1) the source code of all the gene selection algorithms and (2) the complete set of tables and figures of experiments. Contact: ekzmao@ntu.edu.sg