Automatic capture and reconstruction of computational provenance

  • Authors:
  • James Frew;Dominic Metzger;Peter Slaughter

  • Affiliations:
  • Donald Bren School of Environmental Science and Management, University of California, Santa Barbara, CA 93106-5131, U.S.A.;Donald Bren School of Environmental Science and Management, University of California, Santa Barbara, CA 93106-5131, U.S.A.;Donald Bren School of Environmental Science and Management, University of California, Santa Barbara, CA 93106-5131, U.S.A.

  • Venue:
  • Concurrency and Computation: Practice & Experience - The First Provenance Challenge
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Earth System Science Server (ES3) project is developing a local infrastructure for managing Earth science data products derived from satellite remote sensing. By ‘local,’ we mean the infrastructure that a scientist uses to manage the creation and dissemination of her own data products, particularly those that are constantly incorporating corrections or improvements based on the scientist's own research. Therefore, in addition to being robust and capacious enough to support public access, ES3 is intended to be flexible enough to manage the idiosyncratic computing ensembles that typify scientific research. Instead of specifying provenance explicitly with a workflow model, ES3 extracts provenance information automatically from arbitrary applications by monitoring their interactions with their execution environment. These interactions (arguments, file I-O, system calls, etc.) are logged to the ES3 database, which assembles them into provenance graphs. These graphs resemble workflow specifications, but are really reports—they describe what actually happened, as opposed to what was requested. The ES3 database supports forward and backward navigation through provenance graphs (i.e. ancestor-descendant queries), as well as graph retrieval. Copyright © 2007 John Wiley & Sons, Ltd.