Estimating misclassification error with small samples via bootstrap cross-validation

Authors:
Wenjiang J. Fu;Raymond J. Carroll;Suojin Wang
Affiliations:
Department of Statistics, Texas A & M University 447 Blocker Building, 3143 TAMU, College Station, TX 77843, USA;Department of Statistics, Texas A & M University 447 Blocker Building, 3143 TAMU, College Station, TX 77843, USA;Department of Statistics, Texas A & M University 447 Blocker Building, 3143 TAMU, College Station, TX 77843, USA
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 10

Classification of gene-expression data: The manifold-based metric learning way

Pattern Recognition
Estimating the Confidence Interval for Prediction Errors of Support Vector Machine Classifiers

The Journal of Machine Learning Research
Artificial Neural Network to Predict Skeletal Metastasis in Patients with Prostate Cancer

Journal of Medical Systems
Outlier exploration and diagnostic classification of a multi-centre 1H-MRS brain tumour database

Neurocomputing
Conditional confidence intervals for classification error rate

Computational Statistics & Data Analysis
Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

International Journal of Applied Mathematics and Computer Science
Automated identification of protein classification and detection of annotation errors in protein databases using statistical approaches

KDLL'06 Proceedings of the 2006 international conference on Knowledge Discovery in Life Science Literature
A new monte carlo-based error rate estimator

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Resampling methods for meta-model validation with recommendations for evolutionary computation

Evolutionary Computation
The reliability of estimated confidence intervals for classification error rates when only a single sample is available

Pattern Recognition

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Estimation of misclassification error has received increasing attention in clinical diagnosis and bioinformatics studies, especially in small sample studies with microarray data. Current error estimation methods are not satisfactory because they either have large variability (such as leave-one-out cross-validation) or large bias (such as resubstitution and leave-one-out bootstrap). While small sample size remains one of the key features of costly clinical investigations or of microarray studies that have limited resources in funding, time and tissue materials, accurate and easy-to-implement error estimation methods for small samples are desirable and will be beneficial. Results: A bootstrap cross-validation method is studied. It achieves accurate error estimation through a simple procedure with bootstrap resampling and only costs computer CPU time. Simulation studies and applications to microarray data demonstrate that it performs consistently better than its competitors. This method possesses several attractive properties: (1) it is implemented through a simple procedure; (2) it performs well for small samples with sample size, as small as 16; (3) it is not restricted to any particular classification rules and thus applies to many parametric or non-parametric methods. Contact: wfu@stat.tamu.edu