K-means Clustering in the Cloud -- A Mahout Test

  • Authors:
  • Rui Maximo Esteves;Rui Pais;Chunming Rong

  • Affiliations:
  • -;-;-

  • Venue:
  • WAINA '11 Proceedings of the 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The K-Means is a well known clustering algorithm that has been successfully applied to a wide variety of problems. However, its application has usually been restricted to small datasets. Mahout is a cloud computing approach to K-Means that runs on a Hadoop system. Both Mahout and Hadoop are free and open source. Due to their inexpensive and scalable characteristics, these platforms can be a promising technology to solve data intensive problems which were not trivial in the past. In this work we studied the performance of Mahout using a large data set. The tests were running on Amazon EC2 instances and allowed to compare the gain in runtime when running on a multi node cluster. This paper presents some results of ongoing research.