pPOP: Fast yet accurate parallel hierarchical clustering using partitioning
Data & Knowledge Engineering
Hi-index | 0.00 |
In this paper we show that most hierarchical agglomerativeclustering (HAC)algorithms follow a 90-10 rule where roughly 90%iterations from the beginning merge cluster pairs with dissimilarity less than 10%of the maximumdissimilarity. We propose two algorithms - 2-phase andnested - based on partially overlapping partitioning (POP).To handle high-dimensional data efficiently, we propose a tree structure particularly suitable for POP. Extensive experimentsshow that the proposed algorithms reduce the time andmemory requirement of existing HAC algorithms significantly without compromising in accuracy.