MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
SPRINT: A Scalable Parallel Classifier for Data Mining
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An extensible meta-learning approach for scalable and accurate inductive learning
An extensible meta-learning approach for scalable and accurate inductive learning
Management of intelligent learning agents in distributed data mining systems
Management of intelligent learning agents in distributed data mining systems
Scaling up: distributed machine learning with cooperation
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Mining concept-drifting data streams using ensemble classifiers
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Greedy regression ensemble selection: Theory and an application to water quality prediction
Information Sciences: an International Journal
Focused Ensemble Selection: A Diversity-Based Method for Greedy Ensemble Selection
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Adaptive ROC-based ensembles of HMMs applied to anomaly detection
Pattern Recognition
Ensemble pruning via base-classifier replacement
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Expert pruning based on genetic algorithm in regression problems
ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part III
Energy-Based metric for ensemble selection
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A competitive ensemble pruning approach based on cross-validation technique
Knowledge-Based Systems
Hi-index | 0.00 |
Previous research has shown that averaging ensemble can scale up learning over very large cost-sensitive datasets with linear speedup independent of the learning algorithms. At the same time, it achieves the same or even better accuracy than a single model computed from the entire dataset. However, one major drawback is its inefficiency in prediction since every base model in the ensemble has to be consulted in order to produce a final prediction. In this paper, we propose several approaches to reduce the number of base classifiers. Among various methods explored, our empirical studies have shown that the benefit-based greedy approach can safely remove more than 90% of the base models while maintaining or even exceeding the prediction accuracy of the original ensemble. Assuming that each base classifier consumes one unit of prediction time, the removal of 90% of base classifiers translates to a prediction speedup of 10 times. On top of pruning, we propose a novel dynamic scheduling approach to further reduce the "expected" number of classifiers employed in prediction. It measures the confidence of a prediction by a subset of classifiers in the pruned ensemble. This confidence is used to decide if more classifiers are needed in order to produce a prediction that is the same as the original unpruned ensemble. This approach reduces the "expected" number of classifiers by another 25% to 75% without loss of accuracy.