Computational geometry: an introduction
Computational geometry: an introduction
Algorithms for clustering data
Algorithms for clustering data
The SEQUOIA 2000 storage benchmark
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A multiple-resolution method for edge-centric data clustering
Proceedings of the eighth international conference on Information and knowledge management
Data mining: concepts and techniques
Data mining: concepts and techniques
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
pPOP: Fast yet accurate parallel hierarchical clustering using partitioning
Data & Knowledge Engineering
Rough clustering of sequential data
Data & Knowledge Engineering
Improving density-based methods for hierarchical clustering of web pages
Data & Knowledge Engineering
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Domain taxonomy learning from text: The subsumption method versus hierarchical clustering
Data & Knowledge Engineering
Hi-index | 0.00 |
Clustering is the task of grouping similar objects into clusters. A prominent and useful class of algorithm is hierarchical agglomerative clustering (HAC) which iteratively agglomerates the closest pair until all data points belong to one cluster. It outputs a dendrogram showing all N levels of agglomerations where N is the number of objects in the dataset. However, HAC methods have several drawbacks: (1) high time and memory complexities for clustering, and (2) inefficient and inaccurate cluster validation. In this paper we show that these drawbacks can be alleviated by closely studying the dendrogram. Empirical study shows that most HAC algorithms follow a trend where, except for a number of top levels of the dendrogram, all lower levels agglomerate clusters which are very small in size and close in proximity to other clusters. Methods are proposed that exploit this characteristic to reduce the time and memory complexities significantly and to make validation very efficient and accurate. Analyses and experiments show the effectiveness of the proposed method.