A novel hypothesis-margin based approach for feature selection with side pairwise constraints

Authors:
Ming Yang;Jing Song
Affiliations:
School of Computer Science and Technology, Nanjing Normal University, Nanjing 210046, China and Jiangsu Research Center of Information Security & Privacy Technology, Nanjing 210046, China;School of Computer Science and Technology, Nanjing Normal University, Nanjing 210046, China and Jiangsu Research Center of Information Security & Privacy Technology, Nanjing 210046, China
Venue:
Neurocomputing
Year:
2010

Citing 17
Cited 1

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
An introduction to variable and feature selection

The Journal of Machine Learning Research
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Constraint Score: A new filter method for feature selection with pairwise constraints

Pattern Recognition
Letters: A novel condensing tree structure for rough set feature selection

Neurocomputing
Semisupervised Clustering with Metric Learning using Relative Comparisons

IEEE Transactions on Knowledge and Data Engineering
Learning a Mahalanobis distance metric for data clustering and classification

Pattern Recognition
Incorporating User Provided Constraints into Document Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Semi-supervised Document Clustering via Active Learning with Pairwise Constraints

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Large Margin Feature Weighting Method via Linear Programming

IEEE Transactions on Knowledge and Data Engineering
Unsupervised feature evaluation: a neuro-fuzzy approach

IEEE Transactions on Neural Networks

Learning mid-perpendicular hyperplane similarity from cannot-link constraints

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Feature selection is an important problem for pattern classification systems. As compared to unsupervised feature selection methods, the supervised ones have better performance. However, almost all existing supervised ones use class labels as supervised information, very less work has been done for other forms of supervision information such as pairwise constraints, which specifies whether a pair of data samples belongs to the same class (must-link constraints) or different classes (cannot-link constraints). In reality, pairwise constraints can be easily obtained by specifying whether some pairs of examples belong to the same class or not. Therefore, a new filter method for feature selection with pairwise constraints, called Constraint Score, was proposed. Unfortunately, Constraint Score does not consider the case where only cannot-link constraints are given. Also, the conclusion 'must-link constraints are more important than cannot-link constraints' given by Constraint Score algorithm needs to be further verified, since 'cannot-link constraints' seems more important than 'must-link constraints' from the viewpoint of hypothesis-margin or margin. In addition, like the existing supervised feature selection methods, the currently proposed hypothesis-margin based approach for feature selection, called Simba, also utilizes class labels as supervision information. In this paper, to further study the feature selection problem aiming at pairwise constraints, we introduce a novel hypothesis-margin based approach for feature selection with side pairwise constraints, called Simba-sc, which only uses cannot-link constraints as supervision information. We compare our algorithm with the well-known Constraint Score, Fisher Score and Laplacian Score algorithms. Experiments are carried out on 6 UCI data sets using three different classifiers. Experimental results show that, with a few cannot-link constraints, Simba-sc achieves similar or even higher performance than Fisher Score with full class labels on all training data, and has better or comparable performance than Constraint Score.