A correlation-based model for unsupervised feature selection

  • Authors:
  • Michael Edward Houle;Nizar Grira

  • Affiliations:
  • National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new model for feature evaluation and selection that assesses the propensity of the features to support two-set classification. For each item of the data set, the collection of features induce a ranking (ordered list) of the remaining items. The evaluation criterion favors features that result in the most consistent discrimination between relevant and non-relevant items within these ranked lists. The discrimination boundaries within a single list are determined combinatorially, according to the degree of correlation among the relevant sets of its members. The model makes no special assumptions on the nature of the data. A selection heuristic based on the model is also proposed using sequential forward generation, and an experimental comparison is made with other unsupervised feature selection methods.