Which is better: holdout or full-sample classifier design?

  • Authors:
  • Marcel Brun;Qian Xu;Edward R. Dougherty

  • Affiliations:
  • Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX;Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ and Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX

  • Venue:
  • EURASIP Journal on Bioinformatics and Systems Biology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error.