RACK: RApid clustering using K-means algorithm

  • Authors:
  • Vikas K. Garg;M. N. Murty

  • Affiliations:
  • Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India

  • Venue:
  • CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RA pid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.