K-centers algorithm for clustering mixed type data

Authors:
Wei-Dong Zhao;Wei-Hui Dai;Chun-Bin Tang
Affiliations:
Software School, Fudan University, Shanghai, China;School of Management, Fudan University, Shanghai, China;School of Management, Fudan University, Shanghai, China
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 2
Cited 2

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems

Determining the number of clusters using information entropy for mixed data

Pattern Recognition
New cluster ensemble approach to integrative biological data analysis

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The K-modes and K-prototypes algorithms both apply the frequency-based update method for centroids, regarding attribute values with the highest frequency but neglecting other attribute values, which affects the accuracy of clustering results. To solve this problem, the K-centers clustering algorithm is proposed to handle mixed type data. As the extension to the K-prototypes algorithms, hard and fuzzy K-centers algorithm, focusing on effects of attribute values with different frequencies on clustering accuracy, a new update method for centroids is proposed in this paper. Experiments on many UCI machine-learning databases show that the K-centers algorithm can cluster categorical and mixed-type data more efficiently and effectively than the K-modes and K-prototypes algorithms.