A double pruning algorithm for classification ensembles

  • Authors:
  • Víctor Soto;Gonzalo Martínez-Muñoz;Daniel Hernández-Lobato;Alberto Suárez

  • Affiliations:
  • Universidad Autónoma de Madrid, EPS, Madrid, Spain;Universidad Autónoma de Madrid, EPS, Madrid, Spain;Universidad Autónoma de Madrid, EPS, Madrid, Spain;Universidad Autónoma de Madrid, EPS, Madrid, Spain

  • Venue:
  • MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article introduces a double pruning algorithm that can be used to reduce the storage requirements, speed-up the classification process and improve the performance of parallel ensembles. A key element in the design of the algorithm is the estimation of the class label that the ensemble assigns to a given test instance by polling only a fraction of its classifiers. Instead of applying this form of dynamical (instance-based) pruning to the original ensemble, we propose to apply it to a subset of classifiers selected using standard ensemble pruning techniques. The pruned subensemble is built by first modifying the order in which classifiers are aggregated in the ensemble and then selecting the first classifiers in the ordered sequence. Experiments in benchmark problems illustrate the improvements that can be obtained with this technique. Specifically, using a bagging ensemble of 101 CART trees as a starting point, only the 21 trees of the pruned ordered ensemble need to be stored in memory. Depending on the classification task, on average, only 5 to 12 of these 21 classifiers are queried to compute the predictions. The generalization performance achieved by this double pruning algorithm is similar to pruned ordered bagging and significantly better than standard bagging.