k-ANMI: A mutual information based clustering algorithm for categorical data

  • Authors:
  • Zengyou He;Xiaofei Xu;Shengchun Deng

  • Affiliations:
  • Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O. Box 315, 150001, PR China;Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O. Box 315, 150001, PR China;Department of Computer Science and Engineering, Harbin Institute of Technology, 92 West Dazhi Street, P.O. Box 315, 150001, PR China

  • Venue:
  • Information Fusion
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, average normalized mutual information - ANMI) borrowed from cluster ensemble. This algorithm is easy to implement, requiring multiple hash tables as the only major data structure. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-the-art categorical data clustering algorithms with respect to clustering accuracy.