Distributed learning with bagging-like performance

Authors:
Nitesh V. Chawla;Thomas E. Moore;Lawrence O. Hall;Kevin W. Bowyer;W. Philip Kegelmeyer;Clayton Springer
Affiliations:
Department of Computer Science and Engineering, University of South Florida, 4202 East Flower Avenue, Tampa, FL;Department of Computer Science and Engineering, University of South Florida, 4202 East Flower Avenue, Tampa, FL;Department of Computer Science and Engineering, University of South Florida, 4202 East Flower Avenue, Tampa, FL;Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN;Sandia National Laboratories, Biosystems Research Department, P.O. Box 969, MS 9951, Livermore, CA;Sandia National Laboratories, Biosystems Research Department, P.O. Box 969, MS 9951, Livermore, CA
Venue:
Pattern Recognition Letters
Year:
2003

Citing 14
Cited 12

The cascade-correlation learning architecture

Advances in neural information processing systems 2
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
A streaming ensemble algorithm (SEA) for large-scale classification

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Pasting Small Votes for Classification in Large Databases and On-Line

Machine Learning
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Parallel Induction Algorithms for Data Mining

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Tree induction vs. logistic regression: a learning-curve analysis

The Journal of Machine Learning Research
Cached sufficient statistics for efficient machine learning with large datasets

Journal of Artificial Intelligence Research
Scaling up: distributed machine learning with cooperation

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Bagging, boosting, and C4.S

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

Comparing Pure Parallel Ensemble Creation Techniques Against Bagging

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning Ensembles from Bites: A Scalable and Accurate Approach

The Journal of Machine Learning Research
Comments on "A parallel mixture of SVMs for very large scale problems"

Neural Computation
Using classifier ensembles to label spatially disjoint data

Information Fusion
Constructing ensembles of symbolic classifiers

International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
Induction of multiclass multifeature split decision trees from distributed data

Pattern Recognition
Detecting and ordering salient regions

Data Mining and Knowledge Discovery
From centralized to distributed decision tree induction using CHAID and fisher's linear discriminant function algorithms

Intelligent Decision Technologies
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Ensembles of classifiers from spatially disjoint data

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Data partitioning evaluation measures for classifier ensembles

MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Context-Sensitive regression analysis for distributed data

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.10

Visualization

Abstract

Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments with decision tree and neural network classifiers on various datasets show that, given the same size partitions and bags, disjoint partitions result in performance equivalent to, or better than, bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve use of datasets that are too large to handle in the memory of the typical computer. Hence, bagging with samples the size of the data is impractical. Our results indicate that, in such applications, the simple approach of creating a committee of n classifiers from disjoint partitions each of size 1/n (which will be memory resident during learning) in a distributed way results in a classifier which has a bagging-like performance gain. The use of distributed disjoint partitions in learning is significantly less complex and faster than bagging.