Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
A fuzzy k-modes algorithm for clustering categorical data
IEEE Transactions on Fuzzy Systems
Determining the number of clusters using information entropy for mixed data
Pattern Recognition
New cluster ensemble approach to integrative biological data analysis
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
The K-modes and K-prototypes algorithms both apply the frequency-based update method for centroids, regarding attribute values with the highest frequency but neglecting other attribute values, which affects the accuracy of clustering results. To solve this problem, the K-centers clustering algorithm is proposed to handle mixed type data. As the extension to the K-prototypes algorithms, hard and fuzzy K-centers algorithm, focusing on effects of attribute values with different frequencies on clustering accuracy, a new update method for centroids is proposed in this paper. Experiments on many UCI machine-learning databases show that the K-centers algorithm can cluster categorical and mixed-type data more efficiently and effectively than the K-modes and K-prototypes algorithms.