The reliability of estimated confidence intervals for classification error rates when only a single sample is available

Authors:
Blaise Hanczar;Edward R. Dougherty
Affiliations:
LIPADE, University Paris Descartes, 45 rue des saint-peres, 75006 Paris, France;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA and Computational Biology Division, Translational Genomics Research Institute, Phoenix, AZ, USA
Venue:
Pattern Recognition
Year:
2013

Citing 17
Cited 1

Inference for the Generalization Error

Machine Learning
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Estimating misclassification error with small samples via bootstrap cross-validation

Bioinformatics
Prediction error estimation: a comparison of resampling methods

Bioinformatics
Optimal convex error estimators for classification

Pattern Recognition
Decorrelation of the true and estimated classifier errors in high-dimensional settings

EURASIP Journal on Bioinformatics and Systems Biology
On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers

Pattern Recognition
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Exact correlation between actual and estimated errors in discrete classification

Pattern Recognition Letters
Small-sample precision of ROC-related estimates

Bioinformatics
Small-sample precision of ROC-related estimates

Bioinformatics
Exact performance of error estimators for discrete classifiers

Pattern Recognition
Joint sampling distribution between actual and estimated classification errors for linear discriminant analysis

IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Over-optimism in bioinformatics

Bioinformatics
Multiple-rule bias in the comparison of classification rules

Bioinformatics
Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model

Pattern Recognition
Analytic Study of Performance of Error Estimators for Linear Discriminant Analysis

IEEE Transactions on Signal Processing

A new hybrid metaheuristic for medical data classification

International Journal of Metaheuristics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Error estimation accuracy is the salient issue regarding the validity of a classifier model. When samples are small, training-data-based error estimates tend to suffer from inaccuracy and quantification of error estimation accuracy is difficult. Numerous methods have been proposed for estimating confidence intervals for the true error based on the estimated error. This paper surveys proposed methods and quantifies their performance. Monte Carlo methods are used to obtain accurate estimates of the true confidence intervals and compare these to the intervals estimated from samples. We consider different error estimators and several proposed confidence-bound estimators. Both synthetic and real genomic data are employed. Our simulations show the majority of the confidence intervals methods have poor performance because of the difference of shape between true and estimated intervals. According to our results, the best estimation strategy is to use the 10-time 10-fold cross-validation with a confidence interval based on the standard deviation.