Storage resource managers: essential components for the Grid
Grid resource management
Active Management of Scientific Data
IEEE Internet Computing
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance for Computational Tasks: A Survey
Computing in Science and Engineering
The Open Provenance Model: An Overview
Provenance and Annotation of Data and Processes
Guest Editors' Introduction: Reproducible Research
Computing in Science and Engineering
Distributed Reproducible Research Using Cached Computations
Computing in Science and Engineering
The case of the fake Picasso: preventing history forgery with secure provenance
FAST '09 Proccedings of the 7th conference on File and storage technologies
Tracking Files in the Kepler Provenance Framework
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Building the Trident Scientific Workflow Workbench for Data Management in the Cloud
ADVCOMP '09 Proceedings of the 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences
Provenance collection support in the kepler scientific workflow system
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Provenance management for data exploration
DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Query language constructs for provenance
Proceedings of the 15th Symposium on International Database Engineering & Applications
Experiment explorer: lightweight provenance search over metadata
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
SourceTrac: tracing data sources within spreadsheets
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Towards integrating workflow and database provenance
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Hi-index | 0.00 |
As scientists continue to migrate their work to computational methods, it is important to track not only the steps involved in the computation but also the data consumed and produced. While this provenance information can be captured, in existing approaches, it often contains only weak references between data and provenance. When data files or provenance are moved or modified, it can be difficult to find the data associated with the provenance or to find the provenance associated with the data. We propose a persistent storage mechanism that manages input, intermediate, and output data files, strengthening the links between provenance and data. This mechanism provides better support for reproducibility because it ensures the data referenced in provenance information can be readily located. Another important benefit of such management is that it allows caching of intermediate data which can then be shared with other users. We present an implemented infrastructure for managing data in a provenance-aware manner and demonstrate its application in scientific projects.