Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Data Provenance: Some Basic Issues
FST TCS 2000 Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science
Lineage tracing for general data warehouse transformations
The VLDB Journal — The International Journal on Very Large Data Bases
Provenance-Aware Sensor Data Storage
ICDEW '05 Proceedings of the 21st International Conference on Data Engineering Workshops
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
LIVE: a lineage-supported versioned DBMS
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Towards low overhead provenance tracking in near real-time stream filtering
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Towards integrating workflow and database provenance
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Ariadne: managing fine-grained provenance on data streams
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
E-science applications use fine grained data provenance to maintain the reproducibility of scientific results, i.e., for each processed data tuple, the source data used to process the tuple as well as the used approach is documented. Since most of the e-science applications perform on-line processing of sensor data using overlapping time windows, the overhead of maintaining fine grained data provenance is huge especially in longer data processing chains. This is because data items are used by many time windows. In this paper, we propose an approach to reduce storage costs for achieving fine grained data provenance by maintaining data provenance on the relation level instead on the tuple level and make the content of the used database reproducible. The approach has prototypically been implemented for streaming and manually sampled data.