A declarative approach to customize workflow provenance

  • Authors:
  • Saumen Dey;Bertram Ludascher

  • Affiliations:
  • University of California at Davis, Davis, California;University of California at Davis, Davis, California

  • Venue:
  • Proceedings of the Joint EDBT/ICDT 2013 Workshops
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Provenance describes the origin, context, derivation, and ownership of data products and is becoming increasingly important in scientific applications. This information can be used, e.g., to explain, debug, and reproduce the results of computational experiments, or to determine the validity and quality of data products. In contrast, it may be infeasible or undesirable to share complete provenance of a scientific experiment. Towards finding a balance between these requirements, we develop a framework and a system that allows scientists to declaratively specify their provenance data publication and customization requirements. Using this system, scientists can specify which parts of the provenance data are to be included in the result and which parts should be hidden, or anonymized. However, arbitrary application of these specifications may not maintain provenance data integrity. Thus, we allow scientists to specify provenance data integrity requirements, in form of provenance policies, along with their provenance data publication and customization requirements. Our system then systematically applies all the publication and customization requirements on the provenance data and ensures all the provenance policies as specified by the scientist.