A double pruning algorithm for classification ensembles

Authors:
Víctor Soto;Gonzalo Martínez-Muñoz;Daniel Hernández-Lobato;Alberto Suárez
Affiliations:
Universidad Autónoma de Madrid, EPS, Madrid, Spain;Universidad Autónoma de Madrid, EPS, Madrid, Spain;Universidad Autónoma de Madrid, EPS, Madrid, Spain;Universidad Autónoma de Madrid, EPS, Madrid, Spain
Venue:
MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
Year:
2010

Citing 16
Cited 1

Bagging predictors

Machine Learning
An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization

Machine Learning
Random Forests

Machine Learning
Ensembling neural networks: many could be better than all

Artificial Intelligence
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
The ``Test and Select'' Approach to Ensemble Combination

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Ensemble Methods in Machine Learning

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Limiting the Number of Trees in Random Forests

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Pruning Adaptive Boosting

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Ensemble selection from libraries of models

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Using boosting to prune bagging ensembles

Pattern Recognition Letters
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Ensemble Pruning Via Semi-definite Programming

The Journal of Machine Learning Research
Statistical Instance-Based Pruning in Ensembles of Independent Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation

IEEE Transactions on Pattern Analysis and Machine Intelligence

Margin optimization based pruning for random forest

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article introduces a double pruning algorithm that can be used to reduce the storage requirements, speed-up the classification process and improve the performance of parallel ensembles. A key element in the design of the algorithm is the estimation of the class label that the ensemble assigns to a given test instance by polling only a fraction of its classifiers. Instead of applying this form of dynamical (instance-based) pruning to the original ensemble, we propose to apply it to a subset of classifiers selected using standard ensemble pruning techniques. The pruned subensemble is built by first modifying the order in which classifiers are aggregated in the ensemble and then selecting the first classifiers in the ordered sequence. Experiments in benchmark problems illustrate the improvements that can be obtained with this technique. Specifically, using a bagging ensemble of 101 CART trees as a starting point, only the 21 trees of the pruned ordered ensemble need to be stored in memory. Depending on the classification task, on average, only 5 to 12 of these 21 classifiers are queried to compute the predictions. The generalization performance achieved by this double pruning algorithm is similar to pruned ordered bagging and significantly better than standard bagging.