Using domain-specific data to enhance scientific workflow steering queries

  • Authors:
  • João Carlos de A.R. Gonçalves;Daniel de Oliveira;Kary A. C. S. Ocaña;Eduardo Ogasawara;Marta Mattoso

  • Affiliations:
  • COPPE, Federal University of Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil, CEFET/RJ, Brazil;COPPE, Federal University of Rio de Janeiro, Brazil

  • Venue:
  • IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In scientific workflows, provenance data helps scientists in understanding, evaluating and reproducing their results. Provenance data generated at runtime can also support workflow steering mechanisms. Steering facilities for workflows is considered a challenge due to its dynamic demands during execution. To steer, for example, scientists should be able to suspend (or stop) a workflow execution when the approximate solution meets (or deviates) preset criteria. These criteria are commonly evaluated based on provenance data (execution data) and domain-specific data. We claim that the final decision on whether to interfere on the workflow execution may only become feasible when workflows can be steered by scientists using provenance data enriched with domain-specific data. In this paper we propose an approach based on specialized software components, named Data Extractor (DE), to acquire domain-specific data from data files produced during a scientific workflow execution. DE gathers domain-specific data from produced data files and associates it to existing provenance data on the provenance repository. We have evaluated the proposed approach using a real bioinformatics workflow for comparative genomics executed in SciCumulus cloud workflow parallel engine.