Querying and Managing Provenance through User Views in Scientific Workflows

  • Authors:
  • Olivier Biton;Sarah Cohen-Boulakia;Susan B. Davidson;Carmem S. Hara

  • Affiliations:
  • University of Pennsylvania, Philadelphia, USA. biton@cis.upenn.edu;University of Pennsylvania, Philadelphia, USA/ Université/ Paris-Sud 11, Orsay, France. sarahcb@cis.upenn.edu;University of Pennsylvania, Philadelphia, USA. susan@cis.upenn.edu;Universidade Federal do Paraná/, Brazil. carmem@inf.ufpr.br

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Workflow systems have become increasingly popular for managing experiments where many bioinformatics tasks are chained together. Due to the large amount of data generated by these experiments and the need for reproducible results, provenance has become of paramount importance. Workflow systems are therefore starting to provide support for querying provenance. However, the amount of provenance information may be overwhelming, so there is a need for abstraction mechanisms to help users focus on the most relevant information. The technique we pursue is that of "user views." Since bioinformatics tasks may themselves be complex sub-workflows, a user view determines what level of sub-workflow the user can see, and thus what data and tasks are visible in provenance queries. In this paper, we formalize the notion of user views, demonstrate how they can be used in provenance queries, and give an algorithm for generating a user view based on which tasks are relevant for the user. We then describe our prototype and give performance results. Although presented in the context of scientific workflows, the technique applies to other data-oriented workflows.