When efficient model averaging out-performs boosting and bagging

Authors:
Ian Davidson;Wei Fan
Affiliations:
State University of New York, Albany, NY;IBM T.J. Watson, NY
Venue:
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Year:
2006

Citing 6
Cited 1

An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Machine Learning

Machine Learning
Is random model better? On its accuracy and efficiency

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An Improved Categorization of Classifier's Sensitivity on Sample Selection Bias

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An ensemble technique for stable learners with performance bounds

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Cross domain distribution adaptation via kernel mapping

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly due to BOC being perceived as being inefficient and because bagging and boosting consistently outperforms a single model, which raises the question: “Do we even need BOC in datamining?”. We show that the answer to this question is “yes” by illustrating several recent efficient model averaging approximations to BOC can significantly outperform bagging and boosting in realistic situations such as extensive class label noise, sample selection bias and many-class problems. That model averaging techniques outperform bagging and boosting in these situations has not been published in the machine learning, mining or statistical communities to our knowledge.