BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Tolerance approximation spaces
Fundamenta Informaticae - Special issue: rough sets
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Clustering Algorithms
Rough Sets and Data Mining: Analysis of Imprecise Data
Rough Sets and Data Mining: Analysis of Imprecise Data
A Generalized Definition of Rough Approximations Based on Similarity
IEEE Transactions on Knowledge and Data Engineering
Fast Hierarchical Clustering Based on Compressed Data and OPTICS
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Rough clustering of sequential data
Data & Knowledge Engineering
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
High scent web page recommendations using fuzzy rough set attribute reduction
Transactions on rough sets XIV
Tolerance rough set theory based data summarization for clustering large datasets
Transactions on rough sets XIV
Attribute Reduction in Formal Contexts: A Covering Rough Set Approach
Fundamenta Informaticae - Knowledge Technology
Hi-index | 0.00 |
The single-link (SL) clustering method is not scalable with the size of the dataset and needs many database scans. This is potentially a severe problem for large datasets. One way to speed up the SL method is to summarize the data efficiently and subsequently apply the SL method to the summary of the data. In this paper, we propose a summarization scheme based on a tolerance rough set theory called data-sphere (DS). The SL method is modified to work with data spheres. The proposed clustering method takes considerably less time compared to the classical single-link method which is applied to the dataset directly. The clustering results produced by the proposed method is very close to that of the SL method. We also show that proposed summarization scheme outperforms recently introduced data bubbles (DB) as a summarization scheme when single-link is applied to it at clustering quality. Experiments are conducted with two synthetic and two real world datasets to show effectiveness of the proposed method.