Model-based clustering of high-dimensional data: Variable selection versus facet determination

Authors:
Leonard K. M. Poon;Nevin L. Zhang;Tengfei Liu;April H. Liu
Affiliations:
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China;Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Venue:
International Journal of Approximate Reasoning
Year:
2013

Citing 30
Cited 0

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables

Machine Learning - Special issue on learning with probabilistic representations
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Stable local computation with conditional Gaussian distributions

Statistics and Computing
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Hierarchical Latent Class Models for Cluster Analysis

The Journal of Machine Learning Research
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Efficient Learning of Hierarchical Latent Class Models

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Meta Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Local Propagation in Conditional Gaussian Bayesian Networks

The Journal of Machine Learning Research
Penalized Model-Based Clustering with Application to Variable Selection

The Journal of Machine Learning Research
Non-redundant Multi-view Clustering via Orthogonalization

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
A new feature selection method for Gaussian mixture clustering

Pattern Recognition
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Simultaneous Localized Feature Selection and Model Detection for Gaussian Mixtures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling and Reasoning with Bayesian Networks

Modeling and Reasoning with Bayesian Networks
A principled and flexible framework for finding alternative clusterings

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixture-model cluster analysis using information theoretical criteria

Intelligent Data Analysis
Latent tree models and approximate inference in Bayesian networks

Journal of Artificial Intelligence Research
Latent tree models for multivariate density estimation: algorithms and applications

Latent tree models for multivariate density estimation: algorithms and applications
Greedy Learning of Binary Latent Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Latent Tree Graphical Models

The Journal of Machine Learning Research
Model-based multidimensional clustering of categorical data

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the ''best'' clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection.