The application of AdaBoost for distributed, scalable and on-line learning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Experimental comparisons of online and batch versions of bagging and boosting
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Cost complexity-based pruning of ensemble classifiers
Knowledge and Information Systems
A Study of Two Sampling Methods for Analyzing Large Datasets with ILP
Data Mining and Knowledge Discovery
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Apriori algorithm and game-of-life for predictive analysis in materials science
International Journal of Knowledge-based and Intelligent Engineering Systems - Soft Computing and its Applications to E-Business
Hi-index | 0.00 |
This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It permits the processing of large datasets even if the underlying base learning algorithm cannot efficiently do so. The basic idea is to split incoming data into chunks and build a committee based on classifiers built from these individual chunks. Our method extends earlier work by introducing a method for adaptively pruning the committee. This is essential when applying the algorithm in practice because it dramatically reduces the algorithm's running time and memory consumption. It also makes it possible to efficiently "race" committees corresponding to different chunk sizes. This is important because our empirical results show that the accuracy of the resulting committee can vary significantly with the chunk size. They also show that pruning is indeed crucial to make the method practical for large datasets in terms of running time and memory requirements. Surprisingly, the results demonstrate that pruning can also improve accuracy.