BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Tolerance approximation spaces
Fundamenta Informaticae - Special issue: rough sets
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Rough set approach to incomplete information systems
Information Sciences: an International Journal
ACM Computing Surveys (CSUR)
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Clustering Algorithms
Rough Sets and Data Mining: Analysis of Imprecise Data
Rough Sets and Data Mining: Analysis of Imprecise Data
A Generalized Definition of Rough Approximations Based on Similarity
IEEE Transactions on Knowledge and Data Engineering
Fast Hierarchical Clustering Based on Compressed Data and OPTICS
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Hierarchical Document Clustering Based on Tolerance Rough Set Model
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Incremental and effective data summarization for dynamic hierarchical clustering
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Rough clustering of sequential data
Data & Knowledge Engineering
Data bubbles for non-vector data: speeding-up hierarchical clustering in arbitrary metric spaces
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Granular Sets --- Foundations and Case Study of Tolerance Spaces
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Use of Fuzzy Rough Set Attribute Reduction in High Scent Web Page Recommendations
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Fast Single-Link Clustering Method Based on Tolerance Rough Set Model
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
TI-DBSCAN: clustering with DBSCAN by means of the triangle inequality
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Hi-index | 0.00 |
Finding clusters in large datasets is an interesting challenge in many fields of Science and Technology. Many clustering methods have been successfully developed over the years. However, most of the existing clustering methods need multiple data scans to get converged. Therefore, these methods cannot be applied for cluster analysis in large datasets. Data summarization can be used as a pre-processing step to speed up classical clustering methods for large datasets. In this paper, we propose a data summarization scheme based on tolerance rough set theory termed rough bubble. Rough bubble utilizes leaders clustering method to collect sufficient statistics of the dataset, which can be used to cluster the dataset. We show that proposed summarization scheme outperforms recently introduced data bubble as a summarization scheme when agglomerative hierarchical clustering (single-link) method is applied to it. We also introduce a technique to reduce the number of distance computations required in leaders clustering method. Experiments are conducted with synthetic and real world datasets which show effectiveness of our methods for large datasets.