A decision-theoretic generalization of on-line learning and an application to boosting
Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
The application of AdaBoost for distributed, scalable and on-line learning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Logistic Regression, AdaBoost and Bregman Distances
Machine Learning
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Gossip-Based Computation of Aggregate Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Gossip-based aggregation in large dynamic networks
ACM Transactions on Computer Systems (TOCS)
Distributed Data Mining in Peer-to-Peer Networks
IEEE Internet Computing
Distributed classification in peer-to-peer networks
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Boosting products of base classifiers
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Adaptive Peer Sampling with Newscast
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Asynchronous peer-to-peer data mining with stochastic gradient descent
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Automatic document organization in a p2p environment
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Asynchronous distributed power iteration with gossip-based normalization
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
We focus on the problem of data mining over large-scale fully distributed databases, where each node stores only one data record. We assume that a data record is never allowed to leave the node it is stored at. Possible motivations for this assumption include privacy or a lack of a centralized infrastructure. To tackle this problem, earlier we proposed the generic gossip learning framework (GoLF), but so far we have studied only basic linear algorithms. In this paper we implement the well-known boosting technique in GoLF. Boosting techniques have attracted growing attention in machine learning due to their outstanding performance in many practical applications. Here, we present an implementation of a boosting algorithm that is based on FilterBoost. Our main algorithmic contribution is a derivation of a pure online multi-class version of FilterBoost, so that it can be employed in GoLF. We also propose improvements to GoLF, with the aim of maximizing the diversity of the evolving models gossiped in the network, a feature that we show to be important. We evaluate the robustness and the convergence speed of the algorithm empirically over three benchmark databases. We compare the algorithm with the sequential AdaBoost algorithm and we test its performance in a failure scenario involving message drop and delay, and node churn.