Provenance collection support in the kepler scientific workflow system

  • Authors:
  • Ilkay Altintas;Oscar Barney;Efrat Jaeger-Frank

  • Affiliations:
  • San Diego Supercomputer Center, University of California, San Diego, CA;Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT;San Diego Supercomputer Center, University of California, San Diego, CA

  • Venue:
  • IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and overhead. Such data analysis can be facilitated by the recent advancements in scientific workflow systems. A major profit when using scientific workflow systems is the ability to make provenance collection a part of the workflow. Specifically, provenance should include not only the standard data lineage information but also information about the context in which the workflow was used, execution that processed the data, and the evolution of the workflow design. In this paper we describe a complete framework for data and process provenance in the Kepler Scientific Workflow System. We outline the requirements and issues related to data and workflow provenance in a multi-disciplinary workflow system and introduce how generic provenance capture can be facilitated in Kepler's actor-oriented workflow environment. We also describe the usage of the stored provenance information for efficient rerun of scientific workflows.