An efficient feature selection approach for clustering: using a Gaussian mixture model of data dissimilarity

Authors:
Chieh-Yuan Tsai;Chuang-Cheng Chiu
Affiliations:
Industrial Engineering and Management Department, Yuan-Ze University, Taiwan, R.O.C.;Industrial Engineering and Management Department, Yuan-Ze University, Taiwan, R.O.C.
Venue:
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part I
Year:
2007

Citing 13
Cited 2

Feature selection in unsupervised learning via evolutionary search

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Algorithms

Clustering Algorithms
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Subset Selection and Order Identification for Unsupervised Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection for Clustering - A Filter Solution

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Dimensionality Reduction of Unsupervised Data

ICTAI '97 Proceedings of the 9th International Conference on Tools with Artificial Intelligence
Clustering and Information Retrieval (Network Theory and Applications)

Clustering and Information Retrieval (Network Theory and Applications)
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Feature selection in robust clustering based on Laplace mixture

Pattern Recognition Letters
A filter feature selection method for clustering

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems

An outlier-aware data clustering algorithm in mixture models

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
A hybrid feature selection scheme and self-organizing map model for machine health assessment

Applied Soft Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Rapid advances in computer and database technologies have enabled organizations to accumulate vast amounts of data recently. These huge data make the data analysis task become more complicated. Feature selection is an effective dimensionality reduction technique by removing irrelevant, redundant, or noisy features. This research proposes a novel feature-selecting measure to evaluate feature importance for clustering process. The proposed measure aims at extracting useful information from the dissimilarity between two data objects since data dissimilarity is a common principle to determine whether data objects can be located within the same cluster or not. Therefore, the dissimilarity between a pair of data objects is used to develop the proposed feature-selecting measure. In the research, the probability distribution of the dissimilarity variable is considered as a mixture model consisting of the two "intra-cluster" and "inter-cluster" dissimilarity Gaussian distributions. The means of the two Gaussian distributions can be inferred by the EM algorithm. Accordingly, the difference between the two means is regarded as a meaningful measure to select important features for clustering. The effectiveness of the proposed feature-selecting measure for clustering is demonstrated using a set of experiments.