Fast hierarchical clustering and other applications of dynamic closest pairs
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
ACM Computing Surveys (CSUR)
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Optimal algorithms for complete linkage clustering in d dimensions
Theoretical Computer Science
The First Subquadratic Algorithm for Complete Linkage Clustering
ISAAC '95 Proceedings of the 6th International Symposium on Algorithms and Computation
Optimal Time Bounds for Approximate Clustering
Machine Learning
Data Mining and Knowledge Discovery
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Agglomerative hierarchical clustering with constraints: theoretical and empirical results
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Mining temporal patterns in popularity of web items
Information Sciences: an International Journal
Hi-index | 0.00 |
In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In this paper we propose a pruning heuristics aimed at improving performances in these cases, which is well integrated in all the phases of the HAC process and can be applied to two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.