The peaking phenomenon in the presence of feature-selection

  • Authors:
  • Chao Sima;Edward R. Dougherty

  • Affiliations:
  • Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA;Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ 85004, USA and Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 7 ...

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

For a fixed sample size, a common phenomenon is that the error of a designed classifier decreases and then increases as the number of features grows. This peaking phenomenon has been recognized for forty years and depends on the classification rule and feature-label distribution. Historically, the peaking phenomenon has been treated by assuming a fixed ordering of the features, usually beginning with the strongest individual feature and proceeding with features of decreasing individual classification capability. This does not take into account feature-selection, which is commonplace in high-dimensional and small sample settings. This paper revisits the peaking phenomenon in the presence of feature-selection. Using massive simulation in a high-performance computing environment, the paper considers various combinations of feature-label models, feature-selection algorithms, and classifier models to produce a large library of error versus feature size curves. Owing to the prevalence of feature-selection in genomic classification, we also consider gene-expression-based classification of breast-cancer patient prognosis. Results vary widely and are strongly dependent on the combination. The error curves tend to fall into three categories: peaking, settling into a plateau, or falling very slowly over a long range of feature set sizes. It can be concluded that one should be wary of applying peaking results found in the absence of feature-selection to settings in which feature-selection is employed.