An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Authors:
Liang Bai;Jiye Liang;Chuangyin Dang
Affiliations:
Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, ...;Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 Shanxi, ...;Department of Manufacturing Engineering and Engineering Management, City University of Hong Kong, Hong Kong
Venue:
Knowledge-Based Systems
Year:
2011

Citing 28
Cited 8

Algorithms for clustering data

Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure

Pattern Recognition
A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm

Pattern Recognition Letters
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Applications of Data Mining in Computer Security

Applications of Data Mining in Computer Security
An iterative initial-points refinement algorithm for categorical data clustering

Pattern Recognition Letters
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Finding Localized Associations in Market Basket Data

IEEE Transactions on Knowledge and Data Engineering
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On efficiently summarizing categorical databases

Knowledge and Information Systems
A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A genetic algorithm that exchanges neighboring centers for k-means clustering

Pattern Recognition Letters
Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters

IEEE Transactions on Knowledge and Data Engineering
A new measure of uncertainty based on knowledge granulation for rough sets

Information Sciences: an International Journal
A new initialization method for categorical data clustering

Expert Systems with Applications: An International Journal
An initialization method for the K-Means algorithm using neighborhood model

Computers & Mathematics with Applications
“Best K”: critical clustering structures in categorical datasets

Knowledge and Information Systems
A new initialization method for clustering categorical data

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Positive approximation: An accelerator for attribute reduction in rough set theory

Artificial Intelligence
Approximation reduction in inconsistent incomplete decision tables

Knowledge-Based Systems
Applications of an enhanced cluster validity index method based on the Fuzzy C-means and rough set theories to partition and classification

Expert Systems with Applications: An International Journal
A framework for clustering categorical time-evolving data

IEEE Transactions on Fuzzy Systems
A hybrid particle swarm optimization approach for clustering and classification of datasets

Knowledge-Based Systems
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems
A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms

IEEE Transactions on Pattern Analysis and Machine Intelligence

Determining the number of clusters using information entropy for mixed data

Pattern Recognition
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
Semantically-grounded construction of centroids for datasets with textual attributes

Knowledge-Based Systems
A modification of the k-means method for quasi-unsupervised learning

Knowledge-Based Systems
A novel fuzzy clustering algorithm with between-cluster information for categorical data

Fuzzy Sets and Systems
An improved genetic clustering algorithm for categorical data

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
The k-modes type clustering plus between-cluster information for categorical data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The leading partitional clustering technique, k-modes, is one of the most computationally efficient clustering methods for categorical data. However, in the k-modes-type algorithms, the performance of their clustering depends on initial cluster centers and the number of clusters needs be known or given in advance. This paper proposes a novel initialization method for categorical data which is implemented to the k-modes-type algorithms. The proposed method can not only obtain the good initial cluster centers but also provide a criterion to find candidates for the number of clusters. The performance and scalability of the proposed method has been studied on real data sets. The experimental results illustrate that the proposed method is effective and can be applied to large data sets for its linear time complexity with respect to the number of data points.