Data Staging Strategies and Their Impact on the Execution of Scientific Workflows

  • Authors:
  • Shishir Bharathi;Ann Chervenak

  • Affiliations:
  • USC Information Sciences Institute, Marina del Rey, CA;USC Information Sciences Institute, Marina del Rey, CA

  • Venue:
  • Proceedings of the second international workshop on Data-aware distributed computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data intensive workflows process and generate large amounts of data. Strategies employed to stage data in and out of compute resources can often have a significant impact on the overall execution of a workflow. We study the relationships between data placement services that perform the staging and workflow managers that control the release of computational jobs. We describe a framework that classifies data staging strategies into decoupled, loosely-coupled and tightly-coupled modes, based on the degree of their interaction with the workflow manager. We present the results of simulation studies that investigate the effect of decoupled, loosely-coupled and tightly-coupled data staging strategies on synthetic workflows resembling those from real world scientific applications.