A correlation-based model for unsupervised feature selection

Authors:
Michael Edward Houle;Nizar Grira
Affiliations:
National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 10
Cited 2

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using Rough Sets with Heuristics for Feature Selection

Journal of Intelligent Information Systems
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Dimensionality Reduction of Unsupervised Data

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
The Amsterdam Library of Object Images

International Journal of Computer Vision
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering

Mining influential attributes that capture class and group contrast behaviour

Proceedings of the 17th ACM conference on Information and knowledge management
Unsupervised multi-label text classification using a world knowledge ontology

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new model for feature evaluation and selection that assesses the propensity of the features to support two-set classification. For each item of the data set, the collection of features induce a ranking (ordered list) of the remaining items. The evaluation criterion favors features that result in the most consistent discrimination between relevant and non-relevant items within these ranked lists. The discrimination boundaries within a single list are determined combinatorially, according to the degree of correlation among the relevant sets of its members. The model makes no special assumptions on the nature of the data. A selection heuristic based on the model is also proposed using sequential forward generation, and an experimental comparison is made with other unsupervised feature selection methods.