Constraint Score: A new filter method for feature selection with pairwise constraints

  • Authors:
  • Daoqiang Zhang;Songcan Chen;Zhi-Hua Zhou

  • Affiliations:
  • Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;Department of Computer Science and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form of supervision information for feature selection, i.e. pairwise constraints, which specifies whether a pair of data samples belong to the same class (must-link constraints) or different classes (cannot-link constraints). Pairwise constraints arise naturally in many tasks and are more practical and inexpensive than class labels. This topic has not yet been addressed in feature selection research. We call our pairwise constraints guided feature selection algorithm as Constraint Score and compare it with the well-known Fisher Score and Laplacian Score algorithms. Experiments are carried out on several high-dimensional UCI and face data sets. Experimental results show that, with very few pairwise constraints, Constraint Score achieves similar or even higher performance than Fisher Score with full class labels on the whole training data, and significantly outperforms Laplacian Score.