Proceedings of the sixth international workshop on Machine learning
Hierarchical mixtures of experts and the EM algorithm
Neural Computation
Game theory, on-line prediction and boosting
COLT '96 Proceedings of the ninth annual conference on Computational learning theory
On the Accuracy of Meta-learning for Scalable Data Mining
Journal of Intelligent Information Systems
The application of AdaBoost for distributed, scalable and on-line learning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Parallel Algorithms for Discovery of Association Rules
Data Mining and Knowledge Discovery
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
Parallel Formulations of Decision-Tree Classification Algorithms
Data Mining and Knowledge Discovery
Feature Subset Selection and Order Identification for Unsupervised Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Knowledge Discovery in Spatial Databases
KI '99 Proceedings of the 23rd Annual German Conference on Artificial Intelligence: Advances in Artificial Intelligence
Comparison of neural networks and discriminant analysis in predicting forest cover types
Comparison of neural networks and discriminant analysis in predicting forest cover types
Scaling up: distributed machine learning with cooperation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Learning Ensembles from Bites: A Scalable and Accurate Approach
The Journal of Machine Learning Research
On the Tractability of Rule Discovery from Distributed Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Parallelizing AdaBoost by weights dynamics
Computational Statistics & Data Analysis
Data Mining and Knowledge Discovery
Using classifier ensembles to label spatially disjoint data
Information Fusion
Cascade RSVM in Peer-to-Peer Networks
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Induction of multiclass multifeature split decision trees from distributed data
Pattern Recognition
An agent-based framework for distributed learning
Engineering Applications of Artificial Intelligence
Parallel Approach for Ensemble Learning with Locally Coupled Neural Networks
Neural Processing Letters
Detecting and ordering salient regions
Data Mining and Knowledge Discovery
Parallel boosted regression trees for web search ranking
Proceedings of the 20th international conference on World wide web
Intelligent Decision Technologies
Distributed learning with data reduction
Transactions on computational collective intelligence IV
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
ECML'05 Proceedings of the 16th European conference on Machine Learning
Classification in P2P networks with cascade support vector machines
ACM Transactions on Knowledge Discovery from Data (TKDD)
Hi-index | 0.00 |
The growing amount of available information and its distributed and heterogeneous nature has a major impact on the field of data mining. In this paper, we propose a framework for parallel and distributed boosting algorithms intended for efficient integrating specialized classifiers learned over very large, distributed and possibly heterogeneous databases that cannot fit into main computer memory. Boosting is a popular technique for constructing highly accurate classifier ensembles, where the classifiers are trained serially, with the weights on the training instances adaptively set according to the performance of previous classifiers. Our parallel boosting algorithm is designed for tightly coupled shared memory systems with a small number of processors, with an objective of achieving the maximal prediction accuracy in fewer iterations than boosting on a single processor. After all processors learn classifiers in parallel at each boosting round, they are combined according to the confidence of their prediction. Our distributed boosting algorithm is proposed primarily for learning from several disjoint data sites when the data cannot be merged together, although it can also be used for parallel learning where a massive data set is partitioned into several disjoint subsets for a more efficient analysis. At each boosting round, the proposed method combines classifiers from all sites and creates a classifier ensemble on each site. The final classifier is constructed as an ensemble of all classifier ensembles built on disjoint data sets. The new proposed methods applied to several data sets have shown that parallel boosting can achieve the same or even better prediction accuracy considerably faster than the standard sequential boosting. Results from the experiments also indicate that distributed boosting has comparable or slightly improved classification accuracy over standard boosting, while requiring much less memory and computational time since it uses smaller data sets.