Best subset feature selection for massive mixed-type problems

  • Authors:
  • Eugene Tuv;Alexander Borisov;Kari Torkkola

  • Affiliations:
  • Intel, Analysis and Control Technology, Chandler, AZ;Intel, Analysis and Control Technology, N.Novgorod, Russia;Motorola, Intelligent Systems Lab, Tempe, AZ

  • Venue:
  • IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of identifying a non-redundant subset of important variables. All modern feature selection approaches including filters, wrappers, and embedded methods experience problems in very general settings with massive mixed-type data, and with complex relationships between the inputs and the target. We propose an efficient ensemble-based approach measuring statistical independence between a target and a potentially very large number of inputs including any meaningful order of interactions between them, removing redundancies from the relevant ones, and finally ranking variables in the identified minimum feature set. Experiments with synthetic data illustrate the sensitivity and the selectivity of the method, whereas the scalability of the method is demonstrated with a real car sensor data base.