Improving on bagging with input smearing

Authors:
Eibe Frank;Bernhard Pfahringer
Affiliations:
Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 10
Cited 4

Bagging predictors

Machine Learning
The Random Subspace Method for Constructing Decision Forests

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Randomizing Outputs to Increase Prediction Accuracy

Machine Learning
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random Forests

Machine Learning
Knowledge Acquisition form Examples Vis Multiple Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Stacking Bagged and Dagged Models

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Inference for the Generalization Error

Machine Learning
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Improving multiclass pattern recognition with a co-evolutionary RBFNN

Pattern Recognition Letters
Ensembles of jittered association rule classifiers

Data Mining and Knowledge Discovery
A hybrid particle swarm optimization and its application in neural networks

Expert Systems with Applications: An International Journal
Ensemble approaches for regression: A survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an ensemble of decision trees, it often leads to reduced errors in the predictions when compared to using a single tree. A single tree is built from a training set of size N. Bagging is based on the idea that, ideally, we would like to eliminate the variance due to a particular training set by combining trees built from all training sets of size N. However, in practice, only one training set is available, and bagging simulates this platonic method by sampling with replacement from the original training data to form new training sets. In this paper we pursue the idea of sampling from a kernel density estimator of the underlying distribution to form new training sets, in addition to sampling from the data itself. This can be viewed as “smearing out” the resampled training data to generate new datasets, and the amount of “smear” is controlled by a parameter. We show that the resulting method, called “input smearing”, can lead to improved results when compared to bagging. We present results for both classification and regression problems.