Parallel k-means clustering algorithm on DNA dataset

Authors:
Fazilah Othman;Rosni Abdullah;Nur’Aini Abdul Rashid;Rosalina Abdul Salam
Affiliations:
School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia;School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia;School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia;School of Computer Science, Universiti Sains Malaysia, Penang, Malaysia
Venue:
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Year:
2004

Citing 1
Cited 1

Parallel k/h-Means Clustering for Large Data Sets

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing

Parallelization of K-means clustering on multi-core processors

ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a division of data into groups of similar objects. K-means has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version is implemented based on the inherent parallelism during the Distance Calculation and Centroid Update phases. The parallel K-means algorithm is designed in such a way that each P participating node is responsible for handling n/P data points. We run the program on a Linux Cluster with a maximum of eight nodes using message-passing programming model. We examined the performance based on the percentage of correct answers and its speed-up performance. The outcome shows that our parallel K-means program performs relatively well on large datasets.