A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Density-based clustering using graphics processors
Proceedings of the 18th ACM conference on Information and knowledge management
Clustering performance data efficiently at massive scales
Proceedings of the 24th ACM International Conference on Supercomputing
Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application
TI-DBSCAN: clustering with DBSCAN by means of the triangle inequality
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Spatial outlier detection: data, algorithms, visualizations
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce
ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Effects of the recession on public mood in the UK
Proceedings of the 21st international conference companion on World Wide Web
Nowcasting Events from the Social Web with Statistical Learning
ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient Map/Reduce-Based DBSCAN Algorithm with Optimized Data Partition
CLOUD '12 Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing
A new scalable parallel DBSCAN algorithm using the disjoint-set data structure
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages
Language Resources and Evaluation
Hi-index | 0.00 |
Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters it contains. DBSCAN is the most well-known density-based clustering algorithm. We introduce our version of DBSCAN, called Mr. Scan, which uses a hybrid parallel implementation that combines the MRNet tree-based distribution network with GPGPU-equipped nodes. Mr. Scan avoids the problems of existing implementations by effectively partitioning the point space and by optimizing DBSCAN's computation over dense data regions. We tested Mr. Scan on both a geolocated Twitter dataset and image data obtained from the Sloan Digital Sky Survey. At its largest scale, Mr. Scan clustered 6.5 billion points from the Twitter dataset on 8,192 GPU nodes on Cray Titan in 17.3 minutes. All other parallel DBSCAN implementations have only demonstrated the ability to cluster up to 100 million points.