An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On optimizing an SQL-like nested query
ACM Transactions on Database Systems (TODS)
Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
PostgreSQL: introduction and concepts
PostgreSQL: introduction and concepts
Processing queries with quantifiers a horticultural approach
PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Improved Unnesting Algorithms for Join Aggregate SQL Queries
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
DBNotes: a post-it system for relational databases based on provenance
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey of data provenance in e-science
ACM SIGMOD Record
Execution strategies for SQL subqueries
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SQL query optimization through nested relational algebra
ACM Transactions on Database Systems (TODS)
On the expressiveness of implicit provenance in query and update languages
ACM Transactions on Database Systems (TODS)
Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The perm provenance management system in action
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Proceedings of the 13th International Conference on Extending Database Technology
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
Declarative debugging of wrong and missing answers for SQL views
FLOPS'12 Proceedings of the 11th international conference on Functional and Logic Programming
Hi-index | 0.00 |
Data provenance is essential in applications such as scientific computing, curated databases, and data warehouses. Several systems have been developed that provide provenance functionality for the relational data model. These systems support only a subset of SQL, a severe limitation in practice since most of the application domains that benefit from provenance information use complex queries. Such queries typically involve nested subqueries, aggregation and/or user defined functions. Without support for these constructs, a provenance management system is of limited use. In this paper we address this limitation by exploring the problem of provenance derivation when complex queries are involved. More precisely, we demonstrate that the widely used definition of Why-provenance fails in the presence of nested subqueries, and show how the definition can be modified to produce meaningful results for nested subqueries. We further present query rewrite rules to transform an SQL query into a query propagating provenance. The solution introduced in this paper allows us to track provenance information for a far wider subset of SQL than any of the existing approaches. We have incorporated these ideas into the Perm provenance management system engine and used it to evaluate the feasibility and performance of our approach.