Improving DBSCAN's execution time by using a pruning technique on bit vectors

Authors:
Selim Mimaroglu;Emin Aksehirli
Affiliations:
Department of Computer Engineering, Bahcesehir University, Ciragan Caddesi, 34353 Besiktas, Istanbul, Turkey;Department of Computer Engineering, Bahcesehir University, Ciragan Caddesi, 34353 Besiktas, Istanbul, Turkey
Venue:
Pattern Recognition Letters
Year:
2011

Citing 19
Cited 0

Multidimensional binary search trees used for associative searching

Communications of the ACM
Approaches for scaling DBSCAN algorithm to large spatial databases

Journal of Computer Science and Technology
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Experiments in Parallel Clustering with DBSCAN

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A novel genetic algorithm for automatic clustering

Pattern Recognition Letters
A hamming distance based VLIW/EPIC code compression technique

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
An Efficient Density Based Clustering Algorithm for Large Databases

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Optimizing bitmap indices with efficient compression

ACM Transactions on Database Systems (TODS)
Effective clustering and boundary detection algorithm based on Delaunay triangulation

Pattern Recognition Letters
DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points

Pattern Recognition Letters
An Improved Clustering Algorithm

ISCID '08 Proceedings of the 2008 International Symposium on Computational Intelligence and Design - Volume 01
A distance-relatedness dynamic model for clustering high dimensional data of arbitrary shapes and densities

Pattern Recognition
Robot Brains: Circuits and Systems for Conscious Machines

Robot Brains: Circuits and Systems for Conscious Machines
An adaptive flocking algorithm for performing approximate clustering

Information Sciences: an International Journal
Rough-DBSCAN: A fast hybrid density based clustering method for large data sets

Pattern Recognition Letters
Sorting improves word-aligned bitmap indexes

Data & Knowledge Engineering
EIDBSCAN: An Extended Improving DBSCAN algorithm with sampling techniques

International Journal of Business Intelligence and Data Mining
Continuous K-Means Monitoring with Low Reporting Cost in Sensor Networks

IEEE Transactions on Knowledge and Data Engineering
Data clustering: 50 years beyond K-means

Pattern Recognition Letters

Quantified Score

Hi-index	0.10

Visualization

Abstract

Clustering is the process of assigning a set of physical or abstract objects into previously unknown groups. The goal of clustering is to group similar objects into the same clusters and dissimilar objects into different clusters. Similarities between objects are evaluated by using the attribute values of objects. There are many clustering algorithms in the literature; among them, DBSCAN is a well known density-based clustering algorithm. We improve DBSCAN's execution time performance for binary data sets and Hamming distances. We achieve considerable speed gains by using a novel pruning technique, as well as bit vectors, and binary operations. Our novel method effectively discards distant neighbors of an object and computes only the distances between an object and its possible neighbors. By discarding distant neighbors, we avoid unnecessary distance computations and use less CPU time when compared with the conventional DBSCAN algorithm. However, the accuracy of our method is identical to that of the original DBSCAN. Experimental test results on real and synthetic data sets demonstrate that, by using our pruning technique, we obtain considerably faster execution time results compared to DBSCAN.