Case study of scientific data processing on a cloud using hadoop

  • Authors:
  • Chen Zhang;Hans De Sterck;Ashraf Aboulnaga;Haig Djambazian;Rob Sladek

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada;Department of Applied Mathematics, University of Waterloo, Ontario, Canada;David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada;McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada;McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada

  • Venue:
  • HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop’s MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.