The distributed boosting algorithm

Authors:
Aleksandar Lazarevic;Zoran Obradovic
Affiliations:
Temple University, Philadelphia, PA;Temple University, Philadelphia, PA
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 3
Cited 16

Incremental batch learning

Proceedings of the sixth international workshop on Machine learning
The application of AdaBoost for distributed, scalable and on-line learning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Comparison of neural networks and discriminant analysis in predicting forest cover types

Comparison of neural networks and discriminant analysis in predicting forest cover types

Distributed Pasting of Small Votes

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Sharing Classifiers among Ensembles from Related Problem Domains

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On the optimal working set size in serial and parallel support vector machine learning with the decomposition algorithm

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Distributed classification in peer-to-peer networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Induction of multiclass multifeature split decision trees from distributed data

Pattern Recognition
Online parallel boosting

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
A collaborative training algorithm for distributed learning

IEEE Transactions on Information Theory
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
A comparison study of strategies for combining classifiers from distributed data sources

ICANNGA'09 Proceedings of the 9th international conference on Adaptive and natural computing algorithms
Hierarchical distributed data classification in wireless sensor networks

Computer Communications
An A-Team approach to learning classifiers from distributed data sources

International Journal of Intelligent Information and Database Systems
Distributed learning with data reduction

Transactions on computational collective intelligence IV
Hierarchical aggregate classification with limited supervision for data reduction in wireless sensor networks

Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems
Network game and boosting

ECML'05 Proceedings of the 16th European conference on Machine Learning
HyParSVM: a new hybrid parallel software for support vector machine learning on SMP clusters

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Peer-to-peer distributed text classifier learning in PADMINI

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.06

Visualization

Abstract

In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneous databases that cannot be merged at a single location. Our distributed boosting algorithm can also be used as a parallel classification technique, where a massive database that cannot fit into main computer memory is partitioned into disjoint subsets for a more efficient analysis. In the proposed method, at each boosting round the classifiers are first learned from disjoint datasets and then exchanged amongst the sites. Finally the classifiers are combined into a weighted voting ensemble on each disjoint data set. The ensemble that is applied to an unseen test set represents an ensemble of ensembles built on all distributed sites. In experiments performed on four large data sets the proposed distributed boosting method achieved classification accuracy comparable or even slightly better than the standard boosting algorithm while requiring less memory and less computational time. In addition, the communication overhead of the distributed boosting algorithm is very small making it a viable alternative to the standard boosting for large-scale databases.