Which is better: holdout or full-sample classifier design?

Authors:
Marcel Brun;Qian Xu;Edward R. Dougherty
Affiliations:
Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX;Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ and Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX
Venue:
EURASIP Journal on Bioinformatics and Systems Biology
Year:
2008

Citing 6
Cited 0

Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Genetic test bed for feature selection

Bioinformatics
Optimal convex error estimators for classification

Pattern Recognition
Quantification of the impact of feature selection on the variance of cross-validation error estimation

EURASIP Journal on Bioinformatics and Systems Biology
Impact of error estimation on feature selection

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aim at a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error.