Provenance and the Price of Identity

Authors:
Adriane Chapman;H. V. Jagadish
Affiliations:
University of Michigan, Ann Arbor, USA;University of Michigan, Ann Arbor, USA
Venue:
Provenance and Annotation of Data and Processes
Year:
2008

Citing 16
Cited 1

Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage Tracing for General Data Warehouse Transformations

Proceedings of the 27th International Conference on Very Large Data Bases
A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Provenance management in curated databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Framework for Collecting Provenance in Data-Centric Scientific Workflows

ICWS '06 Proceedings of the IEEE International Conference on Web Services
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
An annotation management system for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and efficient storage of e-Science experiment provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Querying and re-using workflows with VsTrails

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A model of process documentation to determine provenance in mash-ups

ACM Transactions on Internet Technology (TOIT)
Project histories: managing data provenance across collection-oriented scientific workflow runs

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Collection-Oriented scientific workflows for integrating and analyzing biological data

DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
A provenance model for manually curated data

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
An identity crisis in the life sciences

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Provenance artifact identification in the atmospheric composition processing system (ACPS)

TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance

Quantified Score

Hi-index	0.00

Visualization

Abstract

As developers acknowledge that provenance is essential, more and more datasets are attempting to keep provenance records describing how they were created. Some of these datasets are constructed using workflows, others cobble together processes and applications to manipulate the data. While the provenance needs are the same, the inputs and set of processes used must be kept, the identity needs are very different. We outline several identification strategies that can be used for data manipulation outside of workflows. We evaluate these strategies in terms of time to create and store identity, and the space needed to keep this information. Additionally, we discuss the strengths and weaknesses of each strategy.