A cluster centers initialization method for clustering categorical data

Authors:
Liang Bai;Jiye Liang;Chuangyin Dang;Fuyuan Cao
Affiliations:
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, ...;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, ...;Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi, ...
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 23
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure

Pattern Recognition
A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm

Pattern Recognition Letters
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Applications of Data Mining in Computer Security

Applications of Data Mining in Computer Security
COOLCAT: an entropy-based algorithm for categorical clustering

Proceedings of the eleventh international conference on Information and knowledge management
An iterative initial-points refinement algorithm for categorical data clustering

Pattern Recognition Letters
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Finding Localized Associations in Market Basket Data

IEEE Transactions on Knowledge and Data Engineering
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster center initialization algorithm for K-means clustering

Pattern Recognition Letters
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
On efficiently summarizing categorical databases

Knowledge and Information Systems
A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A genetic algorithm that exchanges neighboring centers for k-means clustering

Pattern Recognition Letters
A new measure of uncertainty based on knowledge granulation for rough sets

Information Sciences: an International Journal
A new initialization method for categorical data clustering

Expert Systems with Applications: An International Journal
A new initialization method for clustering categorical data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Positive approximation: An accelerator for attribute reduction in rough set theory

Artificial Intelligence
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	12.05

Visualization

Abstract

The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, the performance of the k-modes clustering algorithm which converges to numerous local minima strongly depends on initial cluster centers. Currently, most methods of initialization cluster centers are mainly for numerical data. Due to lack of geometry for the categorical data, these methods used in cluster centers initialization for numerical data are not applicable to categorical data. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes algorithm. The method integrates the distance and the density together to select initial cluster centers and overcomes shortcomings of the existing initialization methods for categorical data. Experimental results illustrate the proposed initialization method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data objects.