Efficient Yet Accurate Clustering

  • Authors:
  • Manoranjan Dash;Kian-Lee Tan;Huan Liu

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we show that most hierarchical agglomerativeclustering (HAC)algorithms follow a 90-10 rule where roughly 90%iterations from the beginning merge cluster pairs with dissimilarity less than 10%of the maximumdissimilarity. We propose two algorithms - 2-phase andnested - based on partially overlapping partitioning (POP).To handle high-dimensional data efficiently, we propose a tree structure particularly suitable for POP. Extensive experimentsshow that the proposed algorithms reduce the time andmemory requirement of existing HAC algorithms significantly without compromising in accuracy.