MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce

Authors:
Yaobin He;Haoyu Tan;Wuman Luo;Huajian Mao;Di Ma;Shengzhong Feng;Jianping Fan
Affiliations:
-;-;-;-;-;-;-
Venue:
ICPADS '11 Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems
Year:
2011

Citing 0
Cited 4

MapReduce algorithms for big data analysis

Proceedings of the VLDB Endowment
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data

Frontiers of Computer Science: Selected Publications from Chinese Universities
DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scale up of our work are very efficient.