Categorical Data Clustering Using the Combinations of Attribute Values

  • Authors:
  • Hee-Jung Do;Jae-Yearn Kim

  • Affiliations:
  • Department of Industrial Engineering, Hanyang University, Sungdong-gu, Seoul, Korea 133-791;Department of Industrial Engineering, Hanyang University, Sungdong-gu, Seoul, Korea 133-791

  • Venue:
  • ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is an important technique for exploratory data analysis. While most of the earlier clustering algorithms focused on numerical data, real-world problems and data mining applications frequently involve categorical data. Here, we propose a new clustering algorithm for categorical data that is based on the frequency of attribute value combinations. Our algorithm finds all the combinations of attribute values in a record, which represent a subset of all the attribute values, and then groups the records using the frequency of these combinations. As our algorithm considers all the subsets of attribute values in a record, records in a cluster have not only similar attribute value sets but also strongly associated attribute values. We evaluated our algorithm with real and synthetic data sets, and the experimental results demonstrate the effectiveness of our algorithm.