Clustering with Lower Bound on Similarity

  • Authors:
  • Mohammad Al Hasan;Saeed Salem;Benjarath Pupacdi;Mohammed J. Zaki

  • Affiliations:
  • Department of Computer Science, Rensselaer Polytechnic Institute, Troy,;Department of Computer Science, Rensselaer Polytechnic Institute, Troy,;Chulabhorn Research Institute, Laksi, Bangkok, Thailand;Department of Computer Science, Rensselaer Polytechnic Institute, Troy,

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O (logn ) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O (n ). Experiments on real and synthetic datasets show that our algorithm produces more than 40% fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.