The cascade-correlation learning architecture
Advances in neural information processing systems 2
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A streaming ensemble algorithm (SEA) for large-scale classification
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Parallel Induction Algorithms for Data Mining
IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Tree induction vs. logistic regression: a learning-curve analysis
The Journal of Machine Learning Research
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Scaling up: distributed machine learning with cooperation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Comparing Pure Parallel Ensemble Creation Techniques Against Bagging
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning Ensembles from Bites: A Scalable and Accurate Approach
The Journal of Machine Learning Research
Comments on "A parallel mixture of SVMs for very large scale problems"
Neural Computation
Using classifier ensembles to label spatially disjoint data
Information Fusion
Constructing ensembles of symbolic classifiers
International Journal of Hybrid Intelligent Systems - Hybrid Intelligent systems in Ensembles
Induction of multiclass multifeature split decision trees from distributed data
Pattern Recognition
Detecting and ordering salient regions
Data Mining and Knowledge Discovery
Intelligent Decision Technologies
Distributed learning with data reduction
Transactions on computational collective intelligence IV
Ensembles of classifiers from spatially disjoint data
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Data partitioning evaluation measures for classifier ensembles
MCS'05 Proceedings of the 6th international conference on Multiple Classifier Systems
Context-Sensitive regression analysis for distributed data
ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Hi-index | 0.10 |
Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments with decision tree and neural network classifiers on various datasets show that, given the same size partitions and bags, disjoint partitions result in performance equivalent to, or better than, bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve use of datasets that are too large to handle in the memory of the typical computer. Hence, bagging with samples the size of the data is impractical. Our results indicate that, in such applications, the simple approach of creating a committee of n classifiers from disjoint partitions each of size 1/n (which will be memory resident during learning) in a distributed way results in a classifier which has a bagging-like performance gain. The use of distributed disjoint partitions in learning is significantly less complex and faster than bagging.