Provenance query evaluation: what's so special about it?

Authors:
Anastasios Kementsietsidis;Min Wang
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, NY, USA;IBM T.J. Watson Research Center, Hawthorne, NY, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 14
Cited 2

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
Rank/select operations on large alphabets: a tool for text indexing

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
MONDRIAN: Annotating and Querying Databases through Colors and Blocks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Debugging schema mappings with routes

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Intensional associations between data and metadata

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Provenance in databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Adaptive searching in succinctly encoded binary relations and tree-structured documents

Theoretical Computer Science
GridDB: a data-centric overlay for scientific grids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
An annotation management system for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Zoom*UserViews: querying relevant provenance in workflow systems

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Print: a provenance model to support integration processes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Compact explanation of data fusion decisions

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

While provenance has been extensively studied in the literature, the efficient evaluation of provenance queries remains an open problem. Traditional query optimization techniques, like the use of general-purpose indexes, or the materialization of provenance data, fail on different fronts to address the problem. Therefore, the need to develop provenance-aware access methods becomes apparent. This paper starts by identifying some key requirements that are to a large extent specific to provenance queries and are necessary for their efficient evaluation. The first such property, called duality, requires that a single access method is used to evaluate both backward provenance queries (which input items of some analysis generate an output item) and forward provenance queries (which outputs of some analysis does an input item generate). The second property, called locality, guarantees that provenance query evaluation times should depend mainly on the size of the provenance query results and should be largely independent of the total size of provenance data. Motivated by the above, we identify proper data structures with the aforementioned properties, we implement them, and through a detailed set of experiments, we illustrate their effectiveness on the evaluation of provenance queries.