Inference for the Generalization Error
Machine Learning
Optimal convex error estimators for classification
Pattern Recognition
Decorrelation of the true and estimated classifier errors in high-dimensional settings
EURASIP Journal on Bioinformatics and Systems Biology
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Exact correlation between actual and estimated errors in discrete classification
Pattern Recognition Letters
Small-sample precision of ROC-related estimates
Bioinformatics
Small-sample precision of ROC-related estimates
Bioinformatics
Exact performance of error estimators for discrete classifiers
Pattern Recognition
IEEE Transactions on Information Theory - Special issue on information theory in molecular biology and neuroscience
Over-optimism in bioinformatics
Bioinformatics
Multiple-rule bias in the comparison of classification rules
Bioinformatics
Analytic Study of Performance of Error Estimators for Linear Discriminant Analysis
IEEE Transactions on Signal Processing
A new hybrid metaheuristic for medical data classification
International Journal of Metaheuristics
Hi-index | 0.01 |
Error estimation accuracy is the salient issue regarding the validity of a classifier model. When samples are small, training-data-based error estimates tend to suffer from inaccuracy and quantification of error estimation accuracy is difficult. Numerous methods have been proposed for estimating confidence intervals for the true error based on the estimated error. This paper surveys proposed methods and quantifies their performance. Monte Carlo methods are used to obtain accurate estimates of the true confidence intervals and compare these to the intervals estimated from samples. We consider different error estimators and several proposed confidence-bound estimators. Both synthetic and real genomic data are employed. Our simulations show the majority of the confidence intervals methods have poor performance because of the difference of shape between true and estimated intervals. According to our results, the best estimation strategy is to use the 10-time 10-fold cross-validation with a confidence interval based on the standard deviation.