A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets

Authors:
Amir Ahmad;Lipika Dey
Affiliations:
Faculty of Computing and Information Technology, King Abdulaziz University, Rabigh, Saudi Arabia;Innovation Labs, Tata Consultancy Services, New Delhi, India
Venue:
Pattern Recognition Letters
Year:
2011

Citing 18
Cited 2

Algorithms for clustering data

Algorithms for clustering data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
Unsupervised Learning with Mixed Numeric and Nominal Data

IEEE Transactions on Knowledge and Data Engineering
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
SCHISM: A New Approach for Interesting Subspace Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clicks: An effective algorithm for mining subspace clusters in categorical datasets

Data & Knowledge Engineering
A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Pattern Recognition Letters
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

IEEE Transactions on Knowledge and Data Engineering
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
On Data Labeling for Clustering Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Reducing Redundancy in Subspace Clustering

IEEE Transactions on Knowledge and Data Engineering
Enhanced soft subspace clustering integrating within-cluster and between-cluster information

Pattern Recognition
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
Density Conscious Subspace Clustering for High-Dimensional Data

IEEE Transactions on Knowledge and Data Engineering

An architecture for component-based design of representative-based clustering algorithms

Data & Knowledge Engineering
Dynamic clustering of histogram data based on adaptive squared Wasserstein distances

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.10

Visualization

Abstract

Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results.