Capturing and querying workflow runtime provenance with PROV: a practical approach

  • Authors:
  • Flavio Costa;Vítor Silva;Daniel de Oliveira;Kary Ocaña;Eduardo Ogasawara;Jonas Dias;Marta Mattoso

  • Affiliations:
  • COPPE/Federal University of Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Brazil and CEFET-RJ, Brazil;COPPE/Federal University of Rio de Janeiro, Brazil;COPPE/Federal University of Rio de Janeiro, Brazil

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data.