Fast Single-Link Clustering Method Based on Tolerance Rough Set Model

  • Authors:
  • Bidyut Kr. Patra;Sukumar Nandi

  • Affiliations:
  • Department of Computer Science & Engineering, Indian Institute of Technology Guwahati, India 781039;Department of Computer Science & Engineering, Indian Institute of Technology Guwahati, India 781039

  • Venue:
  • RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The single-link (SL) clustering method is not scalable with the size of the dataset and needs many database scans. This is potentially a severe problem for large datasets. One way to speed up the SL method is to summarize the data efficiently and subsequently apply the SL method to the summary of the data. In this paper, we propose a summarization scheme based on a tolerance rough set theory called data-sphere (DS). The SL method is modified to work with data spheres. The proposed clustering method takes considerably less time compared to the classical single-link method which is applied to the dataset directly. The clustering results produced by the proposed method is very close to that of the SL method. We also show that proposed summarization scheme outperforms recently introduced data bubbles (DB) as a summarization scheme when single-link is applied to it at clustering quality. Experiments are conducted with two synthetic and two real world datasets to show effectiveness of the proposed method.