Rapid and brief communication: A k-populations algorithm for clustering categorical data

Authors:
Dae-Won Kim;KiYoung Lee;Doheon Lee;Kwang H. Lee
Affiliations:
Department of BioSystems and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu 305-701, Daejeon, Republic of Korea;Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu 305-701, Daejeon, Republic of Korea;Department of BioSystems and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu 305-701, Daejeon, Republic of Korea;Department of BioSystems and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, Guseong-dong, Yuseong-gu 305-701, Daejeon, Republic of Korea and D ...
Venue:
Pattern Recognition
Year:
2005

Citing 3
Cited 5

Symbolic clustering using a new dissimilarity measure

Pattern Recognition
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
A fuzzy k-modes algorithm for clustering categorical data

IEEE Transactions on Fuzzy Systems

A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Adjusting the clustering results referencing an external set

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
Attribute value weighting in k-modes clustering

Expert Systems with Applications: An International Journal
Context Oriented Analysis of Interest Reflection of Tweeted Webpages based on Browsing Behavior

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Central clustering of categorical data with automated feature weighting

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, the conventional k-modes-type algorithms for clustering categorical data are extended by representing the clusters of categorical data with k-populations instead of the hard-type centroids used in the conventional algorithms. Use of a population-based centroid representation makes it possible to preserve the uncertainty inherent in data sets as long as possible before actual decisions are made. The k-populations algorithm was found to give markedly better clustering results through various experiments.