Efficiently updating materialized views
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
Conceptual modeling for ETL processes
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Lineage tracing for general data warehouse transformations
The VLDB Journal — The International Journal on Very Large Data Bases
Representing and Querying Data Transformations
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Lineage retrieval for scientific data processing: a survey
ACM Computing Surveys (CSUR)
A survey of data provenance in e-science
ACM SIGMOD Record
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Debugging schema mappings with routes
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Update exchange with mappings and provenance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Explaining missing answers to SPJUA queries
Proceedings of the VLDB Endowment
Provenance management in databases under schema evolution
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Supporting database provenance under schema evolution
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
TaPP'13 Proceedings of the 5th USENIX conference on Theory and Practice of Provenance
Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance
Scalable lineage capture for debugging DISC analytics
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
We consider a general workflow setting in which input data sets are processed by a graph of transformations to produce output results. Our goal is to perform efficient selective refresh of elements in the output data, i.e., compute the latest values of specific output elements when the input data may have changed. We explore how data provenance can be used to enable efficient refresh. Our approach is based on capturing one-level data provenance at each transformation when the workflow is run initially. Then at refresh time provenance is used to determine (transitively) which input elements are responsible for given output elements, and the workflow is rerun only on that portion of the data needed for refresh. Our contributions are to formalize the problem setting and the problem itself, to specify properties of transformations and provenance that are required for efficient refresh, and to provide algorithms that apply to a wide class of transformations and workflows. We have built a prototype system supporting the features and algorithms presented in the paper. We report preliminary experimental results on the overhead of provenance capture, and on the crossover point between selective refresh and full workflow recomputation.