Supporting Fine-grained Data Lineage in a Database Visualization Environment
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage tracing for general data warehouse transformations
The VLDB Journal — The International Journal on Very Large Data Bases
Lineage retrieval for scientific data processing: a survey
ACM Computing Surveys (CSUR)
Provenance management in curated databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Intensional associations between data and metadata
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
An annotation management system for relational databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Update exchange with mappings and provenance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Factorised representations of query results: size bounds and readability
Proceedings of the 15th International Conference on Database Theory
Editorial: OPQL: Querying scientific workflow provenance at the graph level
Data & Knowledge Engineering
Hi-index | 0.00 |
Provenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.