QROCK: A quick version of the ROCK algorithm for clustering of categorical data

  • Authors:
  • M. Dutta;A. Kakoti Mahanta;Arun K. Pujari

  • Affiliations:
  • Department of Information Technology, Tezpur University, Tezpur 784 028, India;Department of Computer Science, Guwahati University, Gopi Nath Bordoloi Nagar, Jalukbari, Guwahati, Assam 781 014, India;Department of CIS, University of Hyderabad, Hyderabad 500 046, India

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2005

Quantified Score

Hi-index 0.10

Visualization

Abstract

The ROCK algorithm is an agglomerative hierarchical clustering algorithm for clustering categorical data [Guha S., Rastogi, R., Shim, K., 1999. ROCK: A robust clustering algorithm for categorical attributes. In: Proc. IEEE Internat. Conf. Data Engineering, Sydney, March 1999]. In this paper we prove that under certain conditions, the final clusters obtained by the algorithm are nothing but the connected components of a certain graph with the input data-points as vertices. We propose a new algorithm QROCK which computes the clusters by determining the connected components of the graph. This leads to a very efficient method of obtaining the clusters giving a drastic reduction of the computing time of the ROCK algorithm. We also justify that it is more practical for specifying the similarity threshold rather than specifying the desired number of clusters a priori. The QROCK algorithm also detects the outliers in this process. We also discuss a new similarity measure for categorical attributes.