Provenance and scientific workflows: challenges and opportunities

  • Authors:
  • Susan B. Davidson;Juliana Freire

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA, USA;University of Utah, Salt Lake City, UT, USA

  • Venue:
  • Proceedings of the 2008 ACM SIGMOD international conference on Management of data
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

Provenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. Several workshops have been held on the topic, and it has been the focus of many research projects and prototype systems. This tutorial provides an overview of research issues in provenance for scientific workflows, with a focus on recent literature and technology in this area. It is aimed at a general database research audience and at people who work with scientific data and workflows. We will (1) provide a general overview of scientific workflows, (2) describe research on provenance for scientific workflows and show in detail how provenance is supported in existing systems; (3) discuss emerging applications that are enabled by provenance; and (4) outline open problems and new directions for database-related research.