Examining Statistics of Workflow Evolution Provenance: A First Study

  • Authors:
  • Lauro Lins;David Koop;Erik W. Anderson;Steven P. Callahan;Emanuele Santos;Carlos E. Scheidegger;Juliana Freire;Cláudio T. Silva

  • Affiliations:
  • SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,;SCI Institute & School of Computing, University of Utah,

  • Venue:
  • SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Provenance (also referred to as audit trail, lineage, and pedigree) captures information about the steps used to generate a given data product. Such information provides documentation that is key to determining data quality and authorship, and necessary for preserving, reproducing, sharing and publishing the data. Workflow design, in particular for exploratory tasks (e.g., creating a visualization, mining a data set), requires an involved, trial-and-error process. To solve a problem, a user has to iteratively refine a workflow to experiment with different techniques and try different parameter values, as she formulates and test hypotheses. The maintenance of detailed provenance (or history) of this process has many benefits that go beyond documentation and result reproducibility. Notably, it supports several operations that facilitate exploration, including the ability to return to a previous workflow version in an intuitive way, to undo bad changes, to compare different workflows, and to be reminded of the actions that led to a particular result [2].