Computation of initial modes for K-modes clustering algorithm using evidence accumulation

  • Authors:
  • Shehroz S. Khan;Shri Kant

  • Affiliations:
  • National University of Ireland Galway, Department of Information Technology, Galway, Republic of Ireland;Scientific Analysis Group, Defence R&D Organization, Delhi, India

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering accuracy of partitional clustering algorithm for categorical data primarily depends upon the choice of initial data points (modes) to instigate the clustering process. Traditionally initial modes are chosen randomly. As a consequence of that, the clustering results cannot be generated and repeated consistently. In this paper we present an approach to compute initial modes for K-mode clustering algorithm to cluster categorical data sets. Here, we utilize the idea of Evidence Accumulation for combining the results of multiple clusterings. Initially, n F - dimensional data is decomposed into a large number of compact clusters; the K-modes algorithm performs this decomposition, with several clusterings obtained by N random initializations of the K- modes algorithm. The modes thus obtained from every run of random initializations are stored in a Mode-Pool, PN. The objective is to investigate the contribution of those data objects/patterns that are less vulnerable to the choice of random selection of modes and to choose the most diverse set of modes from the available Mode-Pool that can be utilized as initial modes for the K-mode clustering algorithm. Experimentally we found that by this method we get initial modes that are very similar to the actual/desired modes and gives consistent and better clustering results with less variance of clustering error than the traditional method of choosing random modes.