Random feature subset selection for ensemble based classification of data with missing features

  • Authors:
  • Joseph DePasquale;Robi Polikar

  • Affiliations:
  • Signal Processing and Pattern Recognition Laboratory, Electrical and Computer Engineering, Rowan University, Glassboro, NJ;Signal Processing and Pattern Recognition Laboratory, Electrical and Computer Engineering, Rowan University, Glassboro, NJ

  • Venue:
  • MCS'07 Proceedings of the 7th international conference on Multiple classifier systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on our recent progress in developing an ensemble of classifiers based algorithm for addressing the missing feature problem. Inspired in part by the random subspace method, and in part by an AdaBoost type distribution update rule for creating a sequence of classifiers, the proposed algorithm generates an ensemble of classifiers, each trained on a different subset of the available features. Then, an instance with missing features is classified using only those classifiers whose training dataset did not include the currently missing features. Within this framework, we experiment with several bootstrap sampling strategies each using a slightly different distribution update rule. We also analyze the effect of the algorithm's primary free parameter (the number of features used to train each classifier) on its performance. We show that the algorithm is able to accommodate data with up to 30% missing features, with little or no significant performance drop.