OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance

  • Authors:
  • Chunhyeok Lim;Shiyong Lu;Artem Chebotko;Farshad Fotouhi

  • Affiliations:
  • -;-;-;-

  • Venue:
  • SCC '11 Proceedings of the 2011 IEEE International Conference on Services Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Provenance, which is one kind of metadata that captures the derivation history of a data product, including its original data sources, intermediate products, and the steps that were applied to produce it, has become increasingly important in services computing and scientific workflows to validate, interpret, and analyze the result of scientific computing. Most existing systems store provenance data captured into their own provenance storages of proprietary provenance models and conduct query processing over the physical provenance storages using query languages, such as SQL, SPARQL, and Query, which are closely coupled to the underlying provenance storage strategies. In this paper, we present OPQL, an OPM-level provenance query language, that is directly defined over the Open Provenance Model (OPM). An OPQL query takes an OPM graph as input and produces an OPM graph as output. Therefore, OPQL queries are not tightly coupled to the underlying provenance storage strategies. Our main contributions are: (i) we design OPQL, including graph patterns and an OPM-based graph algebra for OPQL, that efficiently supports provenance lineage queries, (ii) we implement OPQ Lin our OPMPROV system, where the result of OPQL queries is displayed as an OPM graph via the OPMPROV browser. An experimental study is conducted to evaluate the performance and feasibility of OPQL for provenance querying. To our best knowledge, OPQL is the first OPM-level query language for scientific workflow provenance.