Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n)

  • Authors:
  • Richard Simon

  • Affiliations:
  • National Cancer Institute, Bethesda, MD

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

New genomic and proteomic technologies provide measurements of thousands of features for each case. This provides a context for enhanced discovery and false discovery. Most statistical and machine learning procedures were not developed for the pn setting and the literature of DNA microarray studies contains many examples of mis-use of analytic and computatinal methods such a cross-validation. This paper highlights some of key aspects of pn problems for identifying informative features and developing accurate classifiers.