Bagging using statistical queries

Authors:
Anneleen Van Assche;Hendrik Blockeel
Affiliations:
Computer Science Department, Katholieke Universiteit Leuven, Leuven, Belgium;Computer Science Department, Katholieke Universiteit Leuven, Leuven, Belgium
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 8
Cited 0

Binomial random variate generation

Communications of the ACM
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Bagging predictors

Machine Learning
Induction of Decision Trees

Machine Learning
Efficient algorithms for decision tree cross-validation

The Journal of Machine Learning Research
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bagging is an ensemble method that relies on random resampling of a data set to construct models for the ensemble. When only statistics about the data are available, but no individual examples, the straightforward resampling procedure cannot be implemented. The question is then whether bagging can somehow be simulated. In this paper we propose a method that, instead of computing certain heuristics (such as information gain) from a resampled version of the data, estimates the probability distribution of these heuristics under random resampling, and then samples from this distribution. The resulting method is not entirely equivalent to bagging because it ignores certain dependencies among statistics. Nevertheless, experiments show that this “simulated bagging” yields similar accuracy as bagging, while being as efficient and more generally applicable.