Small-sample error estimation for bagged classification rules

Authors:
T. T. Vu;U. M. Braga-Neto
Affiliations:
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX
Venue:
EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing
Year:
2010

Citing 14
Cited 0

The Strength of Weak Learnability

Machine Learning
Boosting a weak learning algorithm by majority

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Bagging predictors

Machine Learning
Random Forests

Machine Learning
Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Is cross-validation better than resubstitution for ranking genes?

Bioinformatics
Is cross-validation valid for small-sample microarray classification?

Bioinformatics
Proteomic mass spectra classification using decision tree based ensemble methods

Bioinformatics
A Comparison of Decision Tree Ensemble Creation Techniques

IEEE Transactions on Pattern Analysis and Machine Intelligence
Out-of-bag estimation of the optimal sample size in bagging

Pattern Recognition
Is bagging effective in the classification of small-sample genomic and proteomic data?

EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Exact performance of error estimators for discrete classifiers

Pattern Recognition
Bagging support vector machine for classification of SELDI-ToF mass spectra of ovarian cancer serum samples

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application of ensemble classification rules in genomics and proteomics has become increasingly common. However, the problem of error estimation for these classification rules, particularly for bagging under the small-sample settings prevalent in genomics and proteomics, is not well understood. Breiman proposed the "out-of-bag" method for estimating statistics of bagged classifiers, which was subsequently applied by other authors to estimate the classification error. In this paper, we give an explicit definition of the out-of-bag estimator that is intended to remove estimator bias, by formulating carefully how the error count is normalized. We also report the results of an extensive simulation study of bagging of common classification rules, including LDA, 3NN, and CART, applied on both synthetic and real patient data, corresponding to the use of common error estimators such as resubstitution, leave-one-out, cross-validation, basic bootstrap, bootstrap 632, bootstrap 632 plus, bolstering, semi-bolstering, in addition to the out-of-bag estimator. The results from the numerical experiments indicated that the performance of the out-of-bag estimator is very similar to that of leave-one-out; in particular, the out-of-bag estimator is slightly pessimistically biased. The performance of the other estimators is consistent with their performance with the corresponding single classifiers, as reported in other studies.