Cross-validation and bootstrapping are unreliable in small sample classification

Authors:
A. Isaksson;M. Wallman;H. Göransson;M. G. Gustafsson
Affiliations:
Department of Medical Sciences, Uppsala University, Academic Hospital, SE-751 85 Uppasala, Sweden;Department of Medical Sciences, Uppsala University, Academic Hospital, SE-751 85 Uppasala, Sweden and Fraunhofer Chalmers Research Centre for Industrial Mathematics, Gothenburg, Sweden;Department of Medical Sciences, Uppsala University, Academic Hospital, SE-751 85 Uppasala, Sweden;Department of Medical Sciences, Uppsala University, Academic Hospital, SE-751 85 Uppasala, Sweden and Department of Engineering Sciences, Uppsala University, P.O. Box 534, SE-751 21 Uppsala, Swede ...
Venue:
Pattern Recognition Letters
Year:
2008

Citing 5
Cited 10

Recent advances in error rate estimation

Pattern Recognition Letters
The nature of statistical learning theory

The nature of statistical learning theory
Relation Between Permutation-Test P Values and Classifier Error Estimates

Machine Learning
Tutorial on Practical Prediction Theory for Classification

The Journal of Machine Learning Research
Is cross-validation valid for small-sample microarray classification?

Bioinformatics

Variance analysis in software fault prediction models

ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
Improving Bayesian credibility intervals for classifier error rates using maximum entropy empirical priors

Artificial Intelligence in Medicine
Permutation Tests for Studying Classifier Performance

The Journal of Machine Learning Research
Noninvasive diagnosis of pulmonary hypertension using heart sound analysis

Computers in Biology and Medicine
Designing of dynamic labor inspection system for construction industry

Expert Systems with Applications: An International Journal
A new monte carlo-based error rate estimator

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
Bayesian hypothesis testing for pattern discrimination in brain decoding

Pattern Recognition
Wrapper feature selection for small sample size data driven by complete error estimates

Computer Methods and Programs in Biomedicine
Resampling methods for quality assessment of classifier performance and optimal number of features

Signal Processing
Towards minimizing the annotation cost of certified text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.10

Visualization

Abstract

The interest in statistical classification for critical applications such as diagnoses of patient samples based on supervised learning is rapidly growing. To gain acceptance in applications where the subsequent decisions have serious consequences, e.g. choice of cancer therapy, any such decision support system must come with a reliable performance estimate. Tailored for small sample problems, cross-validation (CV) and bootstrapping (BTS) have been the most commonly used methods to determine such estimates in virtually all branches of science for the last 20 years. Here, we address the often overlooked fact that the uncertainty in a point estimate obtained with CV and BTS is unknown and quite large for small sample classification problems encountered in biomedical applications and elsewhere. To avoid this fundamental problem of employing CV and BTS, until improved alternatives have been established, we suggest that the final classification performance always should be reported in the form of a Bayesian confidence interval obtained from a simple holdout test or using some other method that yields conservative measures of the uncertainty.