Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases

  • Authors:
  • Chunhyeok Lim;Shiyong Lu;Artem Chebotko;Farshad Fotouhi

  • Affiliations:
  • Department of Computer Science, Wayne State University, Detroit, MI 48202, USA;Department of Computer Science, Wayne State University, Detroit, MI 48202, USA;Department of Computer Science, University of Texas-Pan American, Edinburg, TX 78539, USA;Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Provenance, the metadata that records the derivation history of scientific results, is essential in scientific workflows to support the reproducibility of scientific discovery, result interpretation, and problem diagnosis. To promote and facilitate interoperability among heterogeneous provenance systems, the Open Provenance Model (OPM) was first proposed in 2008 and since then has played an important role in the community. In this paper, we present OPMProv, a relational database-based scientific workflow provenance system, that is compliant with OPM (v1.1). Our main contributions are: (i) we design an entity-relationship diagram for OPM and translate it into a relational database schema for the storage of provenance; (ii) we show that provenance reasoning defined in OPM (v1.1) can be sufficiently supported by OPMProv using recursive views and SQL queries alone without any additional reasoning engine. Experiments are conducted to evaluate the performance of OPMProv in data insertion and provenance querying. A case study is performed, demonstrating that OPMProv can answer all except two queries out of the 16 queries defined in the Third Provenance Challenge.