Managing Faults for Distributed Workflows over Grids

  • Authors:
  • Onyeka Ezenwoye;M. Brian Blake;Gargi Dasgupta;S. Masoud Sadjadi;Selim Kalayci;Liana L. Fong

  • Affiliations:
  • South Dakota State University;University of Notre Dame;IBM Research;Florida International University;Florida International University;IBM T.J. Watson Research Center

  • Venue:
  • IEEE Internet Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Grid applications composed of multiple, distributed jobs are common areas for applying Web-scale workflows. Workflows over grid infrastructures are inherently complicated due to the need to both functionally assure the entire process and coordinate the underlying tasks. Often, these applications are long-running, and fault tolerance becomes a significant concern. Transparency is a vital aspect to understanding fault tolerance in these environments.