On Provenance Minimization

Authors:
Yael Amsterdamer;Daniel Deutch;Tova Milo;Val Tannen
Affiliations:
Tel Aviv University and University of Pennsylvania;Ben Gurion University and University of Pennsylvania;Tel Aviv University;University of Pennsylvania
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2012

Citing 26
Cited 0

On conjunctive queries containing inequalities

Journal of the ACM (JACM)
On the decidability of query containment under constraints

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Equivalences Among Relational Expressions with the Union and Difference Operators

Journal of the ACM (JACM)
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Conjunctive Query Containment Revisited

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Optimal implementation of conjunctive queries in relational data bases

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Efficient query reformulation in peer data management systems

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Data exchange: getting to the core

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
A survey of data provenance in e-science

ACM SIGMOD Record
Rewriting queries with arbitrary aggregation functions using views

ACM Transactions on Database Systems (TODS)
Equivalence of queries combining set and bag-set semantics

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On reconciling data exchange, data integration, and peer data management

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Update exchange with mappings and provenance

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Databases with uncertainty and lineage

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the expressiveness of implicit provenance in query and update languages

ACM Transactions on Database Systems (TODS)
Approximate lineage for probabilistic databases

Proceedings of the VLDB Endowment
Containment of conjunctive queries on annotated relations

Proceedings of the 12th International Conference on Database Theory
Provenance: a future history

Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
Efficient querying and maintenance of network provenance at internet-scale

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The complexity of causality and responsibility for query answers and non-answers

Proceedings of the VLDB Endowment
Relational and XML Data Exchange

Relational and XML Data Exchange
On provenance minimization

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Containment of Conjunctive Queries on Annotated Relations

Theory of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Provenance information has been proved to be very effective in capturing the computational process performed by queries, and has been used extensively as the input to many advanced data management tools (e.g., view maintenance, trust assessment, or query answering in probabilistic databases). We observe here that while different (set-)equivalent queries may admit different provenance expressions when evaluated on the same database, there is always some part of these expressions that is common to all. We refer to this part as the core provenance. In addition to being informative, the core provenance is also useful as a compact input to the aforementioned data management tools. We formally define the notion of core provenance. We study algorithms that, given a query, compute an equivalent (called p-minimal) query that for every input database, the provenance of every result tuple is the core provenance. We study such algorithms for queries of varying expressive power (namely conjunctive queries with disequalities and unions thereof). Finally, we observe that, in general, one would not want to require database systems to execute a specific p-minimal query, but instead to be able to find, possibly off-line, the core provenance of a given tuple in the output (computed by an arbitrary equivalent query), without reevaluating the query. We provide algorithms for such direct computation of the core provenance.