Algorithms for clustering data
Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Multidimensional binary search trees used for associative searching
Communications of the ACM
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Tree structure for efficient data mining using rough sets
Pattern Recognition Letters - Special issue: Rough sets, pattern recognition and data mining
Interval Set Clustering of Web Users with Rough K-Means
Journal of Intelligent Information Systems
IEEE Transactions on Knowledge and Data Engineering
Combining Feature Reduction and Case Selection in Building CBR Classifiers
IEEE Transactions on Knowledge and Data Engineering
l-DBSCAN: A Fast Hybrid Density Based Clustering Method
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
A survey of kernel and spectral methods for clustering
Pattern Recognition
A rough set-based case-based reasoner for text categorization
International Journal of Approximate Reasoning
Pattern Recognition Letters
Speeding-Up the K-Means Clustering Method: A Prototype Based Approach
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Distance based fast hierarchical clustering method for large datasets
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
APSCAN: A parameter free algorithm for clustering
Pattern Recognition Letters
Improving DBSCAN's execution time by using a pruning technique on bit vectors
Pattern Recognition Letters
DBCAMM: A novel density based clustering algorithm via using the Mahalanobis metric
Applied Soft Computing
Speeding-up the kernel k-means clustering method: A prototype based hybrid approach
Pattern Recognition Letters
Hi-index | 0.10 |
Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n^2) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established.