Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs

Authors:
Manish Kumar Anand;Shawn Bowers;Timothy Mcphillips;Bertram Ludäscher
Affiliations:
Department of Computer Science, University of California, Davis,;UC Davis Genome Center, University of California, Davis, and Department of Computer Science, Gonzaga University,;UC Davis Genome Center, University of California, Davis,;Department of Computer Science, University of California, Davis, and UC Davis Genome Center, University of California, Davis,
Venue:
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Year:
2009

Citing 16
Cited 11

A survey of data provenance in e-science

ACM SIGMOD Record
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Tackling the Provenance Challenge one layer at a time

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Advanced data flow support for scientific grid workflow applications

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient lineage tracking for scientific workflows

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Provenance and Annotation of Data and Processes
Scientific workflow design for mere mortals

Future Generation Computer Systems
Efficient provenance storage over nested data collections

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Petri net + nested relational calculus = dataflow

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Provenance collection support in the kepler scientific workflow system

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
A model for user-oriented data provenance in pipelined scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Scientific Workflows: Business as Usual?

BPM '09 Proceedings of the 7th International Conference on Business Process Management
Research issues in data provenance

Proceedings of the 48th Annual Southeast Regional Conference
The Foundations for Provenance on the Web

Foundations and Trends in Web Science
Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases

Future Generation Computer Systems
PROPUB: towards a declarative approach for publishing customized, policy-aware provenance

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Achieving reproducibility by combining provenance with service and workflow versioning

Proceedings of the 6th workshop on Workflows in support of large-scale science
Reconciling provenance policy conflicts by inventing anonymous nodes

ESWC'11 Proceedings of the 8th international conference on The Semantic Web
MTCProv: a practical provenance query framework for many-task scientific computing

Distributed and Parallel Databases
Using domain-specific data to enhance scientific workflow steering queries

IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Grand challenges on the theory of modeling and simulation

Proceedings of the Symposium on Theory of Modeling & Simulation - DEVS Integrative M&S Symposium
Editorial: OPQL: Querying scientific workflow provenance at the graph level

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing approaches for representing the provenance of scientific workflow runs largely ignore computation models that work over structured data, including XML. Unlike models based on transformation semantics, these computation models often employ update semantics, in which only a portion of an incoming XML stream is modified by each workflow step. Applying conventional provenance approaches to such models results in provenance information that is either too coarse (e.g., stating that one version of an XML document depends entirely on a prior version) or potentially incorrect (e.g., stating that each element of an XML document depends on every element in a prior version). We describe a generic provenance model that naturally represents workflow runs involving processes that work over nested data collections and that employ update semantics. Moreover, we extend current query approaches to support our model, enabling queries to be posed not only over data lineage relationships, but also over versions of nested data structures produced during a workflow run. We show how hybrid queries can be expressed against our model using high-level query constructs and implemented efficiently over relational provenance storage schemes.