Computational geometry: an introduction
Computational geometry: an introduction
Parallel Algorithms for Hierarchical Clustering and Cluster Validity
IEEE Transactions on Pattern Analysis and Machine Intelligence
A parallel algorithm for record clustering
ACM Transactions on Database Systems (TODS)
The SEQUOIA 2000 storage benchmark
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Parallel algorithms for hierarchical clustering
Parallel Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Data mining: concepts and techniques
Data mining: concepts and techniques
Parallel programming in OpenMP
Parallel programming in OpenMP
Journal of Parallel and Distributed Computing
Fast hierarchical clustering and its validation
Data & Knowledge Engineering
Efficient Yet Accurate Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Editorial: Large scale instance selection by means of federal instance selection
Data & Knowledge Engineering
An adaptive parallel hierarchical clustering algorithm
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Domain taxonomy learning from text: The subsumption method versus hierarchical clustering
Data & Knowledge Engineering
Hi-index | 0.03 |
Hierarchical agglomerative clustering (HAC) is very useful but due to high CPU time and memory complexity its practical use is limited. Earlier, we proposed an efficient partitioning - partially overlapping partitioning (POP) - based on the fact that in HAC small and closely placed clusters are agglomerated initially, and only towards the end larger and distant clusters are agglomerated. Here, we present the parallel version of POP, pPOP. Theoretical analysis shows that, compared to the existing algorithms, pPOP achieves CPU time speed-up and memory scale-down of O(c) without compromising accuracy where c is the number of cells in the partition. A shared memory implementation shows that pPOP outperforms existing algorithms significantly.