Research on Clustering Algorithm and Its Parallelization Strategy

  • Authors:
  • Lingjuan Li;Yang Xi

  • Affiliations:
  • -;-

  • Venue:
  • ICCIS '11 Proceedings of the 2011 International Conference on Computational and Information Sciences
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

As a hot topic of recent study, clouding computing can help us to analyze and process massive data effectively. Clustering is one of the important tasks of data mining. This paper focuses on how to improve the performance of clustering algorithm on massive data. A hierarchical-based DBSCAN algorithm (named HDBSCAN) is proposed by improving the existing density-based clustering algorithm DBSCAN, and the parallel execution strategies of the HDBSCAN algorithm on Map Reduce of cloud computing is designed. The experiment to test the performance of HDBSCAN is done on Hadoop which is a cloud computing platform. The experimental result shows that HDBSCAN can effectively improve the efficiency of clustering massive data.