Principles of programming with complex objects and collection types
ICDT '92 Selected papers of the fourth international conference on Database theory
Why and Where: A Characterization of Data Provenance
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Specification and verification of data-driven Web applications
Journal of Computer and System Sciences
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Update exchange with mappings and provenance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Zoom*UserViews: querying relevant provenance in workflow systems
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Databases with uncertainty and lineage
The VLDB Journal — The International Journal on Very Large Data Bases
Provenance in collection-oriented scientific workflows
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Annotated XML: queries and provenance
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the expressiveness of implicit provenance in query and update languages
ACM Transactions on Database Systems (TODS)
Mapping the NRC Dataflow Model to the Open Provenance Model
Provenance and Annotation of Data and Processes
The Open Provenance Model: An Overview
Provenance and Annotation of Data and Processes
Containment of conjunctive queries on annotated relations
Proceedings of the 12th International Conference on Database Theory
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Fine-grained and efficient lineage querying of collection-based workflow provenance
Proceedings of the 13th International Conference on Extending Database Technology
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A formal semantics for the Taverna 2 workflow model
Journal of Computer and System Sciences
A graph model of data and workflow provenance
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Layering in provenance systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Provenance for aggregate queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hierarchical models of provenance
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Toward provenance capturing as cross-cutting concern
TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Database support for exploring scientific workflow provenance graphs
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Towards integrating workflow and database provenance
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
WebLab PROV: computing fine-grained provenance links for XML artifacts
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Enhancing and abstracting scientific workflow provenance for data publishing
Proceedings of the Joint EDBT/ICDT 2013 Workshops
TaPP'13 Proceedings of the 5th USENIX conference on Theory and Practice of Provenance
Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance
Ariadne: managing fine-grained provenance on data streams
Proceedings of the 7th ACM international conference on Distributed event-based systems
On assisting scientific data curation in collection-based dataflows using labels
WORKS '13 Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
Hi-index | 0.00 |
Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all inputs (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an output may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow provenance. It also enables a number of novel graph transformation operations, allowing to choose the desired level of granularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We implemented our approach in the Lipstick system and developed a benchmark in support of a systematic performance evaluation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.