From computation models to models of provenance: the RWS approach

  • Authors:
  • Bertram Ludäscher;Norbert Podhorszki;Ilkay Altintas;Shawn Bowers;Timothy McPhillips

  • Affiliations:
  • Department of Computer Science, University of California, Davis, CA, U.S.A. and Genome Center, University of California, Davis, CA, U.S.A.;Department of Computer Science, University of California, Davis, CA, U.S.A.;San Diego Supercomputer Center, UC, San Diego, CA, U.S.A.;Genome Center, University of California, Davis, CA, U.S.A.;Genome Center, University of California, Davis, CA, U.S.A.

  • Venue:
  • Concurrency and Computation: Practice & Experience - The First Provenance Challenge
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific workflows often benefit from or even require advanced modeling constructs, e.g. nesting of subworkflows, cycles for executing loops, data-dependent routing, and pipelined execution. In such settings, an often overlooked aspect of provenance takes center stage: a suitable model of provenance (MoP) for scientific workflows should be based upon the underlying model of computation (MoC) used for executing the workflows. We can derive an adequate MoP from a MoC (such as Kahn's process networks) by taking into account the assumptions that a MoC entails, and by recording the observables which it affords. In this way, a MoP captures or at least better approximates ‘real’ data dependencies for workflows with advanced modeling constructs. As a specific instance, we elaborate on the Read–Write–ReSet model, a simple and flexible MoP suitable for a number of different MoCs. Copyright © 2007 John Wiley & Sons, Ltd.