Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

Authors:
P. Viswanath;V. Suresh Babu
Affiliations:
Pattern Recognition Research Lab, Department of Computer Science and Engineering, NRI Institute of Technology, Guntur 522 009, Andhra Pradesh, India;Institute for Research in Applicable Computing, Department of Computing and Information Systems, University of Bedfordshire, Luton Campus, Park Square, Luton, LU1 3JU, UK
Venue:
Pattern Recognition Letters
Year:
2009

Citing 15
Cited 9

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Multidimensional binary search trees used for associative searching

Communications of the ACM
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Tree structure for efficient data mining using rough sets

Pattern Recognition Letters - Special issue: Rough sets, pattern recognition and data mining
Interval Set Clustering of Web Users with Rough K-Means

Journal of Intelligent Information Systems
Combining Partitional and Hierarchical Algorithms for Robust and Efficient Data Clustering with Cohesion Self-Merging

IEEE Transactions on Knowledge and Data Engineering
Combining Feature Reduction and Case Selection in Building CBR Classifiers

IEEE Transactions on Knowledge and Data Engineering
l-DBSCAN: A Fast Hybrid Density Based Clustering Method

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
A survey of kernel and spectral methods for clustering

Pattern Recognition
A rough set-based case-based reasoner for text categorization

International Journal of Approximate Reasoning
Median strings

Pattern Recognition Letters

Speeding-Up the K-Means Clustering Method: A Prototype Based Approach

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Distance based fast hierarchical clustering method for large datasets

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
APSCAN: A parameter free algorithm for clustering

Pattern Recognition Letters
A distance based clustering method for arbitrary shaped clusters in large datasets

Pattern Recognition
Improving DBSCAN's execution time by using a pruning technique on bit vectors

Pattern Recognition Letters
DBCAMM: A novel density based clustering algorithm via using the Mahalanobis metric

Applied Soft Computing
An efficient approach for unsupervised fuzzy clustering based on grouping evolution strategies

Pattern Recognition
Speeding-up the kernel k-means clustering method: A prototype based hybrid approach

Pattern Recognition Letters
DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

Information Systems

Quantified Score

Hi-index	0.10

Visualization

Abstract

Density based clustering techniques like DBSCAN are attractive because it can find arbitrary shaped clusters along with noisy outliers. Its time requirement is O(n^2) where n is the size of the dataset, and because of this it is not a suitable one to work with large datasets. A solution proposed in the paper is to apply the leaders clustering method first to derive the prototypes called leaders from the dataset which along with prototypes preserves the density information also, then to use these leaders to derive the density based clusters. The proposed hybrid clustering technique called rough-DBSCAN has a time complexity of O(n) only and is analyzed using rough set theory. Experimental studies are done using both synthetic and real world datasets to compare rough-DBSCAN with DBSCAN. It is shown that for large datasets rough-DBSCAN can find a similar clustering as found by the DBSCAN, but is consistently faster than DBSCAN. Also some properties of the leaders as prototypes are formally established.