Database support for exploring scientific workflow provenance graphs

  • Authors:
  • Manish Kumar Anand;Shawn Bowers;Bertram Ludäscher

  • Affiliations:
  • Microsoft Corporation, Redmond, WA;Dept. of Computer Science, Gonzaga University, Spokane, WA;Dept. of Computer Science, University of California, Davis, CA

  • Venue:
  • SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Provenance graphs generated from real-world scientific workflows often contain large numbers of nodes and edges denoting various types of provenance information. A standard approach used by workflow systems is to visually present provenance information by displaying an entire (static) provenance graph. This approach makes it difficult for users to find relevant information and to explore and analyze data and process dependencies. We address these issues through a set of abstractions that allow users to construct specialized views of provenance graphs. Our model provides operations that allow users to expand, collapse, filter, group, and summarize all or portions of provenance graphs to construct tailored provenance views. A unique feature of the model is that it can be implemented using standard relational database technology, which has a number of advantages in terms of supporting existing provenance frameworks and efficiency and scalability of the model. We present and formalize the operations within the model as a set of relational queries expressed against an underlying provenance schema. We also present a detailed experimental evaluation that demonstrates the feasibility and efficiency of our approach against provenance graphs generated from a number of scientific workflows.