Active feature selection using classes

  • Authors:
  • Huan Liu;Lei Yu;Manoranjan Dash;Hiroshi Motoda

  • Affiliations:
  • Department of Computer Science & Engineering, Arizona State University, Tempe, AZ;Department of Computer Science & Engineering, Arizona State University, Tempe, AZ;Department of Elec. & Computer Engineering, Northwestern University, Evanston, IL;Institute of Scientific & Industrial Research, Osaka University, Ibaraki, Osaka, Japan

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.