Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Provenance management in curated databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
ICWS '06 Proceedings of the IEEE International Conference on Web Services
Provenance-aware storage systems
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
An annotation management system for relational databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Special Issue: The First Provenance Challenge
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and efficient storage of e-Science experiment provenance
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Querying and re-using workflows with VsTrails
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A model of process documentation to determine provenance in mash-ups
ACM Transactions on Internet Technology (TOIT)
Project histories: managing data provenance across collection-oriented scientific workflow runs
DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Collection-Oriented scientific workflows for integrating and analyzing biological data
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
A provenance model for manually curated data
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
An identity crisis in the life sciences
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Provenance artifact identification in the atmospheric composition processing system (ACPS)
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Hi-index | 0.00 |
As developers acknowledge that provenance is essential, more and more datasets are attempting to keep provenance records describing how they were created. Some of these datasets are constructed using workflows, others cobble together processes and applications to manipulate the data. While the provenance needs are the same, the inputs and set of processes used must be kept, the identity needs are very different. We outline several identification strategies that can be used for data manipulation outside of workflows. We evaluate these strategies in terms of time to create and store identity, and the space needed to keep this information. Additionally, we discuss the strengths and weaknesses of each strategy.