The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Multidimensional binary search trees used for associative searching
Communications of the ACM
Approaches for scaling DBSCAN algorithm to large spatial databases
Journal of Computer Science and Technology
An improved equivalence algorithm
Communications of the ACM
Introduction to algorithms
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
High-performance data mining with skeleton-based structured parallel programming
Parallel Computing - Parallel data-intensive algorithms and applications
Parallel Implementation of Borvka's Minimum Spanning Tree Algorithm
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Experiments in Parallel Clustering with DBSCAN
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases
The VLDB Journal — The International Journal on Very Large Data Bases
Design and Evaluation of a Parallel HOP Clustering Algorithm for Cosmological Simulation
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A hybrid unsupervised approach for document clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Hierarchical Density-Based Clustering of Uncertain Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
ST-DBSCAN: An algorithm for clustering spatial-temporal data
Data & Knowledge Engineering
A simple and fast algorithm for K-medoids clustering
Expert Systems with Applications: An International Journal
Next Generation of Data Mining
Next Generation of Data Mining
ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
Semi-supervised Density-Based Clustering
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
A scalable parallel union-find algorithm for distributed memory computers
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Parallel density-based clustering of complex objects
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Experiments on union-find algorithms for the disjoint-set data structure
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
Scalable parallel minimum spanning forest computation
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Multi-core Spanning Forest Algorithms using the Disjoint-set Data Structure
IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using the omega index for evaluating abstractive community detection
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization
Overlapping community detection in networks: The state-of-the-art and comparative study
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
OPTICS is a hierarchical density-based data clustering algorithm that discovers arbitrary-shaped clusters and eliminates noise using adjustable reachability distance thresholds. Parallelizing OPTICS is considered challenging as the algorithm exhibits a strongly sequential data access order. We present a scalable parallel OPTICS algorithm (Poptics) designed using graph algorithmic concepts. To break the data access sequentiality, POPTICS exploits the similarities between the OPTICS algorithm and Prim's Minimum Spanning Tree algorithm. Additionally, we use the disjoint-set data structure to achieve a high parallelism for distributed cluster extraction. Using high dimensional datasets containing up to a billion floating point numbers, we show scalable speedups of up to 27.5 for our OpenMP implementation on a 40-core shared-memory machine, and up to 3,008 for our MPI implementation on a 4,096-core distributed-memory machine. We also show that the quality of the results given by POPTICS is comparable to those given by the classical OPTICS algorithm.