Enabling re-executions of parallel scientific workflows using runtime provenance data

  • Authors:
  • Flávio Costa;Daniel de Oliveira;Kary A. C. S. Ocaña;Eduardo Ogasawara;Marta Mattoso

  • Affiliations:
  • COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil, CEFET, Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil

  • Venue:
  • IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Capturing provenance data in scientific workflows is a key issue since it allows for reproducibility and evaluation of results. Many of these workflows generate around 100,000 tasks that execute in parallel in High Performance Computing environments, such as large clusters and clouds. SciCumulus is a workflow engine for parallel execution in clouds. Activity failure is almost inevitable in clouds where virtual machine failures are a reality rather than a possibility. We present SciMultaneous, a service architecture that manages re-executions of failed scientific workflow tasks using runtime provenance. Experimental results on clouds showed that SciMultaneous considerably increases the workflow completion and reduces the total execution time of the workflow (considering executions and re-executions) up to 11.5%, when compared to ad-hoc approaches.