BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Hi-index | 0.00 |
Hierarchical clustering is common method used to determine clusters of similar data points in multi-dimensional spaces. $O(n^2)$ algorithms, where $n$ is the number of points to cluster, have long been known for this problem. This paper discusses parallel algorithms to perform hierarchical clustering using various distance metrics. I describe $O(n)$ time algorithms for clustering using the single link, average link, complete link, centroid, median, and minimum variance metrics on an $n$ node CRCW PRAM and $O(n \log n)$ algorithms for these metrics (except average link and complete link) on $\frac{n}{\log n}$ node butterfly networks or trees. Thus, optimal efficiency is achieved for a significant number of processors using these distance metrics. A general algorithm is given that can be used to perform clustering with the complete link and average link metrics on a butterfly. While this algorithm achieves optimal efficiency for the general class of metrics, it is not optimal for the specific cases of complete link and average link clustering.