C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
The distributed boosting algorithm
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Creating Ensembles of Classifiers
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Learning Rules from Distributed Data
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Scaling up: distributed machine learning with cooperation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Learning Ensembles from Bites: A Scalable and Accurate Approach
The Journal of Machine Learning Research
Cascade RSVM in Peer-to-Peer Networks
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
MCS'11 Proceedings of the 10th international conference on Multiple classifier systems
Classification in P2P networks with cascade support vector machines
ACM Transactions on Knowledge Discovery from Data (TKDD)
Hi-index | 0.00 |
Bagging and boosting are two popular ensemble methods that achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data ("pasting small votes") is a promising approach for learning from massive datasets. Pasting small votes can utilize the power of boosting and bagging, and potentially scale up to massive datasets. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable to massive datasets.