Is bagging effective in the classification of small-sample genomic and proteomic data?

Authors:
T. T. Vu;U. M. Braga-Neto
Affiliations:
Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX;Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX
Venue:
EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Year:
2009

Citing 12
Cited 1

The Strength of Weak Learnability

Machine Learning
Boosting a weak learning algorithm by majority

COLT '90 Proceedings of the third annual workshop on Computational learning theory
Bagging predictors

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Random Forests

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Proteomic mass spectra classification using decision tree based ensemble methods

Bioinformatics
Experimental study for the comparison of classifier combination methods

Pattern Recognition
Performance of feature-selection methods in the classification of high-dimension data

Pattern Recognition
Bagging support vector machine for classification of SELDI-ToF mass spectra of ovarian cancer serum samples

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Application of majority voting to pattern recognition: an analysis of its behavior and performance

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Small-sample error estimation for bagged classification rules

EURASIP Journal on Advances in Signal Processing - Special issue on genomic signal processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been considerable interest recently in the application of bagging in the classification of both gene-expression data and protein-abundance mass spectrometry data. The approach is often justified by the improvement it produces on the performance of unstable, overfitting classification rules under small-sample situations. However, the question of real practical interest is whether the ensemble scheme will improve performance of those classifiers sufficiently to beat the performance of single stable, nonoverfitting classifiers, in the case of small-sample genomic and proteomic data sets. To investigate that question, we conducted a detailed empirical study, using publicly-available data sets from published genomic and proteomic studies. We observed that, under t-test and RELIEF filter-based feature selection, bagging generally does a good job of improving the performance of unstable, overfitting classifiers, such as CART decision trees and neural networks, but that improvement was not sufficient to beat the performance of single stable, nonoverfitting classifiers, such as diagonal and plain linear discriminant analysis, or 3-nearest neighbors. Furthermore, as expected, the ensemble method did not improve the performance of these classifiers significantly. Representative experimental results are presented and discussed in this work.