Enhancing and abstracting scientific workflow provenance for data publishing

  • Authors:
  • Pinar Alper;Khalid Belhajjame;Carole A. Goble;Pinar Karagoz

  • Affiliations:
  • University of Manchester, Manchester, UK;University of Manchester, Manchester, UK;University of Manchester, Manchester, UK;Middle East Technical University, Ankara, Turkey

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many scientists are using workflows to systematically design and run computational experiments. Once the workflow is executed, the scientist may want to publish the dataset generated as a result, to be, e.g., reused by other scientists as input to their experiments. In doing so, the scientist needs to curate such dataset by specifying metadata information that describes it, e.g. its derivation history, origins and ownership. To assist the scientist in this task, we explore in this paper the use of provenance traces collected by workflow management systems when enacting workflows. Specifically, we identify the shortcomings of such raw provenance traces in supporting the data publishing task, and propose an approach whereby distilled, yet more informative, provenance traces that are fit for the data publishing task can be derived.