RACK: RApid clustering using K-means algorithm

Authors:
Vikas K. Garg;M. N. Murty
Affiliations:
Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India;Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
Venue:
CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Year:
2009

Citing 13
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Ordering effects in clustering

ML92 Proceedings of the ninth international workshop on Machine learning
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Learning in Humans and Machines

Learning in Humans and Machines
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A local search approximation algorithm for k-means clustering

Computational Geometry: Theory and Applications - Special issue on the 18th annual symposium on computational geometry—SoCG2002
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
k-means projective clustering

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Structures and Algorithm Analysis in C++ (3rd Edition)

Data Structures and Algorithm Analysis in C++ (3rd Edition)
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k-means algorithm is an extremely popular technique for clustering data. One of the major limitations of the k-means is that the time to cluster a given dataset D is linear in the number of clusters, k. In this paper, we employ height balanced trees to address this issue. Specifically, we make two major contributions, (a) we propose an algorithm, RACK (acronym for RA pid Clustering using k-means), which takes time favorably comparable with the fastest known existing techniques, and (b) we prove an expected bound on the quality of clustering achieved using RACK. Our experimental results on large datasets strongly suggest that RACK is competitive with the k-means algorithm in terms of quality of clustering, while taking significantly less time.