Fast Single-Link Clustering Method Based on Tolerance Rough Set Model

Authors:
Bidyut Kr. Patra;Sukumar Nandi
Affiliations:
Department of Computer Science & Engineering, Indian Institute of Technology Guwahati, India 781039;Department of Computer Science & Engineering, Indian Institute of Technology Guwahati, India 781039
Venue:
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Year:
2009

Citing 10
Cited 5

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Tolerance approximation spaces

Fundamenta Informaticae - Special issue: rough sets
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Clustering Algorithms

Clustering Algorithms
Rough Sets and Data Mining: Analysis of Imprecise Data

Rough Sets and Data Mining: Analysis of Imprecise Data
A Generalized Definition of Rough Approximations Based on Similarity

IEEE Transactions on Knowledge and Data Engineering
Fast Hierarchical Clustering Based on Compressed Data and OPTICS

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Rough clustering of sequential data

Data & Knowledge Engineering
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Distance based fast hierarchical clustering method for large datasets

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
A distance based clustering method for arbitrary shaped clusters in large datasets

Pattern Recognition
High scent web page recommendations using fuzzy rough set attribute reduction

Transactions on rough sets XIV
Tolerance rough set theory based data summarization for clustering large datasets

Transactions on rough sets XIV
Attribute Reduction in Formal Contexts: A Covering Rough Set Approach

Fundamenta Informaticae - Knowledge Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The single-link (SL) clustering method is not scalable with the size of the dataset and needs many database scans. This is potentially a severe problem for large datasets. One way to speed up the SL method is to summarize the data efficiently and subsequently apply the SL method to the summary of the data. In this paper, we propose a summarization scheme based on a tolerance rough set theory called data-sphere (DS). The SL method is modified to work with data spheres. The proposed clustering method takes considerably less time compared to the classical single-link method which is applied to the dataset directly. The clustering results produced by the proposed method is very close to that of the SL method. We also show that proposed summarization scheme outperforms recently introduced data bubbles (DB) as a summarization scheme when single-link is applied to it at clustering quality. Experiments are conducted with two synthetic and two real world datasets to show effectiveness of the proposed method.