A weighting k-modes algorithm for subspace clustering of categorical data

Authors:
Fuyuan Cao;Jiye Liang;Deyu Li;Xingwang Zhao
Affiliations:
School of Computer and Information Technology, Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006 Shanxi, ...;School of Computer and Information Technology, Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006 Shanxi, ...;School of Computer and Information Technology, Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006 Shanxi, ...;School of Computer and Information Technology, Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006 Shanxi, ...
Venue:
Neurocomputing
Year:
2013

Citing 24
Cited 0

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Feature Weighting in k-Means Clustering

Machine Learning
HARP: A Practical Projected Clustering Algorithm

IEEE Transactions on Knowledge and Data Engineering
Fuzzy clustering of categorical data using fuzzy centroids

Pattern Recognition Letters
Subspace clustering for high dimensional categorical data

ACM SIGKDD Explorations Newsletter
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Discovery of Extremely Low-Dimensional Clusters Using Semi-Supervised Projected Clustering

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Automatic Subspace Clustering of High Dimensional Data

Data Mining and Knowledge Discovery
Projected clustering for categorical datasets

Pattern Recognition Letters
Clicks: An effective algorithm for mining subspace clusters in categorical datasets

Data & Knowledge Engineering
A framework for clustering categorical time-evolving data

IEEE Transactions on Fuzzy Systems
Apply extended self-organizing map to cluster and classify mixed-type data

Neurocomputing
Determining the number of clusters using information entropy for mixed data

Pattern Recognition
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Traditional clustering algorithms consider all of the dimensions of an input data set equally. However, in the high dimensional data, a common property is that data points are highly clustered in subspaces, which means classes of objects are categorized in subspaces rather than the entire space. Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a data set. In this paper, a weighting k-modes algorithm is presented for subspace clustering of categorical data and its corresponding time complexity is analyzed as well. In the proposed algorithm, an additional step is added to the k-modes clustering process to automatically compute the weight of all dimensions in each cluster by using complement entropy. Furthermore, the attribute weight can be used to identify the subsets of important dimensions that categorize different clusters. The effectiveness of the proposed algorithm is demonstrated with real data sets and synthetic data sets.