The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching
Communications of the ACM
Approaches for scaling DBSCAN algorithm to large spatial databases
Journal of Computer Science and Technology
An improved equivalence algorithm
Communications of the ACM
Introduction to algorithms
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
High-performance data mining with skeleton-based structured parallel programming
Parallel Computing - Parallel data-intensive algorithms and applications
Experiments in Parallel Clustering with DBSCAN
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases
The VLDB Journal — The International Journal on Very Large Data Bases
Design and Evaluation of a Parallel HOP Clustering Algorithm for Cosmological Simulation
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A hybrid unsupervised approach for document clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
ST-DBSCAN: An algorithm for clustering spatial-temporal data
Data & Knowledge Engineering
A simple and fast algorithm for K-medoids clustering
Expert Systems with Applications: An International Journal
Next Generation of Data Mining
Next Generation of Data Mining
ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
A scalable parallel union-find algorithm for distributed memory computers
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
New multithreaded ordering and coloring algorithms for multicore architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Parallel density-based clustering of complex objects
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Experiments on union-find algorithms for the disjoint-set data structure
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
Scalable parallel OPTICS data clustering using graph algorithmic techniques
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On the usefulness of object tracking techniques in performance analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
DBSCAN is a well-known density based clustering algorithm capable of discovering arbitrary shaped clusters and eliminating noise data. However, parallelization of Dbscan is challenging as it exhibits an inherent sequential data access order. Moreover, existing parallel implementations adopt a master-slave strategy which can easily cause an unbalanced workload and hence result in low parallel efficiency. We present a new parallel Dbscan algorithm (Pdsdbscan) using graph algorithmic concepts. More specifically, we employ the disjoint-set data structure to break the access sequentiality of Dbscan. In addition, we use a tree-based bottom-up approach to construct the clusters. This yields a better-balanced workload distribution. We implement the algorithm both for shared and for distributed memory. Using data sets containing up to several hundred million high-dimensional points, we show that Pdsdbscan significantly outperforms the master-slave approach, achieving speedups up to 25.97 using 40 cores on shared memory architecture, and speedups up to 5,765 using 8,192 cores on distributed memory architecture.