A MapReduce workflow system for architecting scientific data intensive applications

  • Authors:
  • Phuong Nguyen;Milton Halem

  • Affiliations:
  • University of Maryland Baltimore County, Baltimore, USA;University of Maryland Baltimore County, Baltimore, USA

  • Venue:
  • Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is promising for developing both scalable business and scientific data intensive applications. However, there are few existing scientific workflow systems which can benefit from the MapReduce programming model. We propose a workflow system for integrating structure, and orchestrating MapReduce jobs for scientific data intensive workflows. The system consists of a simple workflow design C++ API, a job scheduler, and a runtime support system for Hadoop or Sector/Sphere frameworks. A climate satellite data intensive processing and analysis application is developed as a use case and an evaluation for the workflow system. The evaluation shows that it is possible to make the steps in the climate data intensive application automatically from data gridding to complex data analysis using the workflow system. The performance of the climate analysis application is significantly improved by the enabled MapReduce workflow system compared with the sequential embarrassing parallel methods. The overhead of the workflow system is negligible. However, the graphic user interface is still under development for the workflow system.