Provenance traces from Chiron parallel workflow engine

  • Authors:
  • Felipe Horta;Vítor Silva;Flavio Costa;Daniel de Oliveira;Kary Ocaña;Eduardo Ogasawara;Jonas Dias;Marta Mattoso

  • Affiliations:
  • Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil and CEFET-RJ, Brazil;Federal University of Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Brazil

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are managed by Scientific Workflow Management Systems (SWfMS). The different languages used by SWfMS may impact in the way the workflow engine executes the workflow, sometimes limiting optimization opportunities. To tackle this issue, we recently proposed a scientific workflow algebra [1]. This algebra is inspired by database relational algebra and it enables automatic optimization of scientific workflows to be executed in parallel in high performance computing (HPC) environments. This way, the experiments presented in this paper were executed in Chiron, a parallel scientific workflow engine implemented to support the scientific workflow algebra. Before executing the workflow, Chiron stores the prospective provenance [2] of the workflow on its provenance database. Each workflow is composed by several activities, and each activity consumes relations. Similarly to relational databases, a relation contains a set of attributes and it is composed by a set of tuples. Each tuple in a relation contains a series of values, each one associated to a specific attribute. The tuples of a relation are distributed to be consumed in parallel over the computing resources according to the workflow activity. During and after the execution, the retrospective provenance [2] is also stored.