Parallel k-means clustering algorithm on DNA dataset

  • Authors:
  • Fazilah Othman;Rosni Abdullah;Nur’Aini Abdul Rashid;Rosalina Abdul Salam

  • Affiliations:
  • School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia;School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia;School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia;School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia

  • Venue:
  • PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a division of data into groups of similar objects. K-means has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version is implemented based on the inherent parallelism during the Distance Calculation and Centroid Update phases. The parallel K-means algorithm is designed in such a way that each P participating node is responsible for handling n/P data points. We run the program on a Linux Cluster with a maximum of eight nodes using message-passing programming model. We examined the performance based on the percentage of correct answers and its speed-up performance. The outcome shows that our parallel K-means program performs relatively well on large datasets.