CloudWF: A Computational Workflow System for Clouds Based on Hadoop

  • Authors:
  • Chen Zhang;Hans Sterck

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo, Canada;Department of Applied Mathematics, University of Waterloo, Canada

  • Venue:
  • CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

This paper describes CloudWF, a scalable and lightweight computational workflow system for clouds on top of Hadoop. CloudWF can run workflow jobs composed of multiple Hadoop MapReduce or legacy programs. Its novelty lies in several aspects: a simple workflow description language that encodes workflow blocks and block-to-block dependencies separately as standalone executable components; a new workflow storage method that uses Hadoop HBase sparse tables to store workflow information internally and reconstruct workflow block dependencies implicitly for efficient workflow execution; transparent file staging with Hadoop DFS; and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. This paper describes the design and implementation of CloudWF.