Database support for exploring scientific workflow provenance graphs

Authors:
Manish Kumar Anand;Shawn Bowers;Bertram Ludäscher
Affiliations:
Microsoft Corporation, Redmond, WA;Dept. of Computer Science, Gonzaga University, Spokane, WA;Dept. of Computer Science, University of California, Davis, CA
Venue:
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Year:
2012

Citing 19
Cited 1

PESTO: An Integrated Query/Browser for Object Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Provenance Explorer-a graphical interface for constructing scientific publication packages from provenance trails

International Journal on Digital Libraries
Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Graphs-at-a-time: query language and access methods for graph databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life

Provenance and Annotation of Data and Processes
Efficient provenance storage over nested data collections

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Querying and Managing Provenance through User Views in Scientific Workflows

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A navigation model for exploring scientific workflow provenance graphs

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Techniques for efficiently querying scientific workflow provenance graphs

Proceedings of the 13th International Conference on Extending Database Technology
Fine-grained and efficient lineage querying of collection-based workflow provenance

Proceedings of the 13th International Conference on Extending Database Technology
Layering in provenance systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Taverna, reloaded

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
The Open Provenance Model core specification (v1.1)

Future Generation Computer Systems
OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance

SCC '11 Proceedings of the 2011 IEEE International Conference on Services Computing
Putting lipstick on pig: enabling database-style workflow provenance

Proceedings of the VLDB Endowment

Characterizing workflow-based activity on a production e-infrastructure using provenance data

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and analyze data and process dependencies. We address these issues through a set of abstractions that allow users to construct specialized views of provenance graphs. Our model provides operations that allow users to expand, collapse, filter, group, and summarize all or portions of provenance graphs to construct tailored provenance views. A unique feature of the model is that it can be implemented using standard relational database technology, which has a number of advantages in terms of supporting existing provenance frameworks and efficiency and scalability of the model. We present and formalize the operations within the model as a set of relational queries expressed against an underlying provenance schema. We also present a detailed experimental evaluation that demonstrates the feasibility and efficiency of our approach against provenance graphs generated from a number of scientific workflows.