Distance based fast hierarchical clustering method for large datasets

  • Authors:
  • Bidyut Kr. Patra;Neminath Hubballi;Santosh Biswas;Sukumar Nandi

  • Affiliations:
  • Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India;Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India;Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India;Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India

  • Venue:
  • RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Average-link (AL) is a distance based hierarchical clustering method, which is not sensitive to the noisy patterns. However, like all hierarchical clustering methods AL also needs to scan the dataset many times. AL has time and space complexity of O(n2), where n is the size of the dataset. These prohibit the use of AL for large datasets. In this paper, we have proposed a distance based hierarchical clustering method termed l-AL which speeds up the classical AL method in any metric (vector or non-vector) space. In this scheme, first leaders clustering method is applied to the dataset to derive a set of leaders and subsequently AL clustering is applied to the leaders. To speed-up the leaders clustering method, reduction in distance computations is also proposed in this paper. Experimental results confirm that the l-AL method is considerably faster than the classical AL method yet keeping clustering results at par with the classical AL method.