A graph model of data and workflow provenance

Authors:
Umut Acar;Peter Buneman;James Cheney;Jan Van Den Bussche;Natalia Kwasnikowska;Stijn Vansummeren
Affiliations:
Max-Planck Institute for Software Systems;University of Edinburgh;University of Edinburgh;Hasselt University;Hasselt University;Université Libre de Bruxelles
Venue:
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Year:
2010

Citing 17
Cited 10

Notions of computation and monads

Information and Computation
Principles of programming with complex objects and collection types

ICDT '92 Selected papers of the fourth international conference on Database theory
Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
On propagation of deletions and annotations through views

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Programming in Haskell

Programming in Haskell
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Provenance-aware storage systems

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
DFL: A dataflow language based on Petri nets and nested relational calculus

Information Systems
Annotated XML: queries and provenance

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the expressiveness of implicit provenance in query and update languages

ACM Transactions on Database Systems (TODS)
Efficient provenance storage over nested data collections

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Detecting and resolving unsound workflow views for correct provenance analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
A formal model of dataflow repositories

DILS'07 Proceedings of the 4th international conference on Data integration in the life sciences
Provenance as dependency analysis

DBPL'07 Proceedings of the 11th international conference on Database programming languages
A formal semantics for the Taverna 2 workflow model

Journal of Computer and System Sciences

A quest for beauty and wealth (or, business processes for database researchers)

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Putting lipstick on pig: enabling database-style workflow provenance

Proceedings of the VLDB Endowment
Hierarchical models of provenance

TaPP'12 Proceedings of the 4th USENIX conference on Theory and Practice of Provenance
Towards integrating workflow and database provenance

IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
A PROV encoding for provenance analysis using deductive rules

IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Ariadne: managing fine-grained provenance on data streams

Proceedings of the 7th ACM international conference on Distributed event-based systems
Local clustering in provenance graphs

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On bridging relational and document-centric data stores

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Diagnosing correctness of semantic workflow models

Data & Knowledge Engineering
A core calculus for provenance

Journal of Computer Security - Security and Trust Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has been found useful in other areas of computer science: annotation databases, probabilistic databases, schema and data integration, etc. In contrast, workflow provenance aims to capture a complete description of evaluation - or enactment - of a workflow, and this is crucial to verification in scientific computation. Workflows and their provenance are often presented using graphical notation, making them easy to visualize but complicating the formal semantics that relates their run-time behavior with their provenance records. We bridge this gap by extending a previously-developed dataflow language which supports both database-style querying and workflow-style batch processing steps to produce a workflow-style provenance graph that can be explicitly queried. We define and describe the model through examples, present queries that extract other forms of provenance, and give an executable definition of the graph semantics of dataflow expressions.