An entropy weighting mixture model for subspace clustering of high-dimensional data

Authors:
Liuqing Peng;Junying Zhang
Affiliations:
School of Computer Science and Technology, Xidian University, 2, Taibai Road, Xi'an 710071, China;School of Computer Science and Technology, Xidian University, 2, Taibai Road, Xi'an 710071, China
Venue:
Pattern Recognition Letters
Year:
2011

Citing 12
Cited 1

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Locally adaptive metrics for clustering high dimensional data

Data Mining and Knowledge Discovery
Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Computational Statistics & Data Analysis
Soft clustering using weighted one-class support vector machines

Pattern Recognition
A Probability Model for Projective Clustering on High Dimensional Data

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Simultaneous Localized Feature Selection and Model Detection for Gaussian Mixtures

IEEE Transactions on Pattern Analysis and Machine Intelligence
Subspace clustering of text documents with feature weighting k-means algorithm

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Fuzzy partition based soft subspace clustering and its applications in high dimensional data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.10

Visualization

Abstract

In high-dimensional data, clusters of objects usually exist in subspaces; besides, different clusters probably have different shape volumes. Most existing methods for high-dimensional data clustering, however, only consider the former factor. They ignore the latter factor by assuming the same shape volume value for different clusters. In this paper we propose a new Gaussian mixture model (GMM) type algorithm for discovering clusters with various shape volumes in subspaces. We extend the GMM clustering method to calculate a local weight vector as well as a local variance within each cluster, and use the weight and variance values to capture main properties that discriminate different clusters, including subsets of relevant dimensions and shape volumes. This is achieved by introducing negative entropy of weight vectors, along with adaptively-chosen coefficients, into the objective function of the extended GMM. Experimental results on both synthetic and real datasets show that the proposed algorithm outperforms its competitors, especially when applying to high-dimensional datasets.