Distance based fast hierarchical clustering method for large datasets

Authors:
Bidyut Kr. Patra;Neminath Hubballi;Santosh Biswas;Sukumar Nandi
Affiliations:
Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India;Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India;Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India;Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India
Venue:
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Year:
2010

Citing 11
Cited 3

Parallel algorithms for hierarchical clustering

Parallel Computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Clustering Algorithms

Clustering Algorithms
Fast hierarchical clustering and its validation

Data & Knowledge Engineering
Incremental and effective data summarization for dynamic hierarchical clustering

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Knowledge and Information Systems
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

Pattern Recognition Letters
Fast Single-Link Clustering Method Based on Tolerance Rough Set Model

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Speeding-Up hierarchical agglomerative clustering in presence of expensive metrics

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

High scent web page recommendations using fuzzy rough set attribute reduction

Transactions on rough sets XIV
Tolerance rough set theory based data summarization for clustering large datasets

Transactions on rough sets XIV
Efficient determination of binary non-negative vector neighbors with regard to cosine similarity

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Average-link (AL) is a distance based hierarchical clustering method, which is not sensitive to the noisy patterns. However, like all hierarchical clustering methods AL also needs to scan the dataset many times. AL has time and space complexity of O(n2), where n is the size of the dataset. These prohibit the use of AL for large datasets. In this paper, we have proposed a distance based hierarchical clustering method termed l-AL which speeds up the classical AL method in any metric (vector or non-vector) space. In this scheme, first leaders clustering method is applied to the dataset to derive a set of leaders and subsequently AL clustering is applied to the leaders. To speed-up the leaders clustering method, reduction in distance computations is also proposed in this paper. Experimental results confirm that the l-AL method is considerably faster than the classical AL method yet keeping clustering results at par with the classical AL method.