Constructing ensembles of classifiers by means of weighted instance selection

  • Authors:
  • Nicolás García-Pedrajas

  • Affiliations:
  • Department of Computing and Numerical Analysis, University of Córdoba, Córdoba, Spain

  • Venue:
  • IEEE Transactions on Neural Networks
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we approach the problem of constructing ensembles of classifiers from the point of view of instance selection. Instance selection is aimed at obtaining a subset of the instances available for training capable of achieving, at least, the same performance as the whole training set. In this way, instance selection algorithms try to keep the performance of the classifiers while reducing the number of instances in the training set. Meanwhile, boosting methods construct an ensemble of classifiers iteratively focusing each new member on the most difficult instances by means of a biased distribution of the training instances. In this work, we show how these two methodologies can be combined advantageously. We can use instance selection algorithms for boosting using as objective to optimize the training error weighted by the biased distribution of the instances given by the boosting method. Our method can be considered as boosting by instance selection. Instance selection has mostly been developed and used for k-nearest neighbor (k-NN) classifiers. So, as a first step, our methodology is suited to construct ensembles of k-NN classifiers. Constructing ensembles of classifiers by means of instance selection has the important feature of reducing the space complexity of the final ensemble as only a subset of the instances is selected for each classifier. However, the methodology is not restricted to k-NN classifier. Other classifiers, such as decision trees and support vector machines (SVMs), may also benefit from a smaller training set, as they produce simpler classifiers if an instance selection algorithm is performed before training. In the experimental section, we show that the proposed approach is able to produce better and simpler ensembles than random subspace method (RSM) method for k-NN and standard ensemble methods for C4.5 and SVMs.