Algorithms for clustering data
Algorithms for clustering data
Symbolic clustering using a new dissimilarity measure
Pattern Recognition
A conceptual version of the K-means algorithm
Pattern Recognition Letters
Clustering Algorithms
Machine Learning and Data Mining; Methods and Applications
Machine Learning and Data Mining; Methods and Applications
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
Experiments with Incremental Concept Formation: UNIMEM
Machine Learning
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
TCSOM: Clustering Transactions Using Self-Organizing Map
Neural Processing Letters
Performing clustering analysis on collaborative models
Intelligent Data Analysis
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
A new initialization method for categorical data clustering
Expert Systems with Applications: An International Journal
Computation of initial modes for K-modes clustering algorithm using evidence accumulation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A new initialization method for clustering categorical data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A cluster centers initialization method for clustering categorical data
Expert Systems with Applications: An International Journal
Attribute value weighting in k-modes clustering
Expert Systems with Applications: An International Journal
Hi-index | 0.10 |
The original k-means clustering algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being directly applied to categorical data clustering in many data mining applications. The k-modes algorithm [Z. Huang, Clustering large data sets with mixed numeric and categorical value, in: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference. World Scientific, Singapore, 1997, pp. 21-34] extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus the k-means fashion of minimizing a numerically valued cost. However, as is the case with most data clustering algorithms, the algorithm requires a pre-setting or random selection of initial points (modes) of the clusters. The differences on the initial points often lead to considerable distinct cluster results. In this paper we present an experimental study on applying Bradley and Fayyad's iterative initial-point refinement algorithm to the k-modes clustering to improve the accurate and repetitiveness of the clustering results [cf. P. Bradley, U. Fayyad, Refining initial points for k-mean clustering, in: Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, 1998]. Experiments show that the k-modes clustering algorithm using refined initial points leads to higher precision results much more reliably than the random selection method without refinement, thus making the refinement process applicable to many data mining applications with categorical data.