RDFProv: A relational RDF store for querying and managing scientific workflow provenance

Authors:
Artem Chebotko;Shiyong Lu;Xubo Fei;Farshad Fotouhi
Affiliations:
Department of Computer Science, University of Texas-Pan American, 1201 West University Drive, Edinburg, TX 78539, USA;Department of Computer Science, Wayne State University, 431 State Hall, 5143 Cass Avenue, Detroit, MI 48202, USA;Department of Computer Science, Wayne State University, 431 State Hall, 5143 Cass Avenue, Detroit, MI 48202, USA;Department of Computer Science, Wayne State University, 431 State Hall, 5143 Cass Avenue, Detroit, MI 48202, USA
Venue:
Data & Knowledge Engineering
Year:
2010

Citing 73
Cited 7

An amateur's introduction to recursive query processing strategies

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
Storage and Querying of E-Commerce Data

Proceedings of the 27th International Conference on Very Large Data Bases
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema

ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
On labeling schemes for the semantic web

WWW '03 Proceedings of the 12th international conference on World Wide Web
Lineage tracing for general data warehouse transformations

The VLDB Journal — The International Journal on Very Large Data Bases
RStar: an RDF storage and query system for enterprise resource management

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Database Systems: An Application Oriented Approach, Complete Version (2nd Edition)

Database Systems: An Application Oriented Approach, Complete Version (2nd Edition)
RDF Aggregate Queries and Views

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Lineage retrieval for scientific data processing: a survey

ACM Computing Surveys (CSUR)
An efficient SQL-based RDF querying scheme

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A survey of data provenance in e-science

ACM SIGMOD Record
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
Optimized Index Structures for Querying RDF from the Web

LA-WEB '05 Proceedings of the Third Latin American Web Congress
Bringing Relational Data into the SemanticWeb using SPARQL and Relational.OWL

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Provenance management in curated databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
VisTrails: visualization meets data management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Programming scientific and distributed workflow with Triana services: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
The Semantic Web Revisited

IEEE Intelligent Systems
A Framework for Collecting Provenance in Data-Centric Scientific Workflows

ICWS '06 Proceedings of the IEEE International Conference on Web Services
A Requirements Driven Framework for Benchmarking Semantic Web Knowledge Base Systems

IEEE Transactions on Knowledge and Data Engineering
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
From SPARQL to rules (and back)

Proceedings of the 16th international conference on World Wide Web
SPARQ2L: towards support for subgraph extraction queries in rdf databases

Proceedings of the 16th international conference on World Wide Web
Recording and using provenance in a protein compressibility experiment

HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Scalable semantic web data management using vertical partitioning

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proof explanation for a nonmonotonic Semantic Web rules language

Data & Knowledge Engineering
PASSing the provenance challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance trails in the Wings-Pegasus system

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Query capabilities of the Karma provenance framework

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Mining Taverna's semantic web of provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Special Issue: The First Provenance Challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and efficient storage of e-Science experiment provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
A Semantic Web approach to the provenance challenge

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Automatic capture and reconstruction of computational provenance

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Towards distributed processing of RDF path queries

International Journal of Web Engineering and Technology
Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web

Proceedings of the 17th international conference on World Wide Web
Efficient provenance storage

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Effective and efficient semantic web data management over DB2

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Deploying defeasible logic rule bases for the semantic web

Data & Knowledge Engineering
Ontology change: Classification and survey

The Knowledge Engineering Review
Semantic Provenance for eScience: Managing the Deluge of Scientific Data

IEEE Internet Computing
Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System

SCC '08 Proceedings of the 2008 IEEE International Conference on Services Computing - Volume 1
RDF-3X: a RISC-style engine for RDF

Proceedings of the VLDB Endowment
A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution

IEEE Transactions on Services Computing
SP^2Bench: A SPARQL Performance Benchmark

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Scalable join processing on very large RDF graphs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Semantics and complexity of SPARQL

ACM Transactions on Database Systems (TODS)
Reusing ontologies on the Semantic Web: A feasibility study

Data & Knowledge Engineering
Semantics preserving SPARQL-to-SQL translation

Data & Knowledge Engineering
GRIN: a graph based RDF index

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A complete translation from SPARQL into efficient SQL

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
LUBM: A benchmark for OWL knowledge base systems

Web Semantics: Science, Services and Agents on the World Wide Web
Explaining answers from the Semantic Web: the Inference Web approach

Web Semantics: Science, Services and Agents on the World Wide Web
Viewing the semantic web through RVL lenses

Web Semantics: Science, Services and Agents on the World Wide Web
A subscribable peer-to-peer RDF repository for distributed metadata management

Web Semantics: Science, Services and Agents on the World Wide Web
Querying distributed RDF data sources with SPARQL

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Efficient management of very large ontologies

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Containment and minimization of RDF/S query patterns

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Benchmarking database representations of RDF/S stores

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
RDFBroker: a signature-based high-performance RDF store

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
SPARQL query processing with conventional relational database systems

WISE'05 Proceedings of the 2005 international conference on Web Information Systems Engineering
Semantics and complexity of SPARQL

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Semantic metadata generation for large scientific workflows

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Incrementally maintaining materializations of ontologies stored in logic databases

Journal on Data Semantics II
Managing rapidly-evolving scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Combining provenance with trust in social networks for semantic web content filtering

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Provenance collection support in the kepler scientific workflow system

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
A model for user-oriented data provenance in pipelined scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
CombeChem: a case study in provenance and annotation using the semantic web

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Storing, reasoning, and querying OPM-compliant scientific workflow provenance using relational databases

Future Generation Computer Systems
A query language for analyzing business processes execution

BPM'11 Proceedings of the 9th international conference on Business process management
Consistency and provenance in rule processing

RuleML'11 Proceedings of the 5th international conference on Rule-based modeling and computing on the semantic web
An approximate duplicate elimination in RFID data streams

Data & Knowledge Engineering
MTCProv: a practical provenance query framework for many-task scientific computing

Distributed and Parallel Databases
WebLab PROV: computing fine-grained provenance links for XML artifacts

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Editorial: OPQL: Querying scientific workflow provenance at the graph level

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Provenance metadata has become increasingly important to support scientific discovery reproducibility, result interpretation, and problem diagnosis in scientific workflow environments. The provenance management problem concerns the efficiency and effectiveness of the modeling, recording, representation, integration, storage, and querying of provenance metadata. Our approach to provenance management seamlessly integrates the interoperability, extensibility, and inference advantages of Semantic Web technologies with the storage and querying power of an RDBMS to meet the emerging requirements of scientific workflow provenance management. In this paper, we elaborate on the design of a relational RDF store, called RDFProv, which is optimized for scientific workflow provenance querying and management. Specifically, we propose: i) two schema mapping algorithms to map an OWL provenance ontology to a relational database schema that is optimized for common provenance queries; ii) three efficient data mapping algorithms to map provenance RDF metadata to relational data according to the generated relational database schema, and iii) a schema-independent SPARQL-to-SQL translation algorithm that is optimized on-the-fly by using the type information of an instance available from the input provenance ontology and the statistics of the sizes of the tables in the database. Experimental results are presented to show that our algorithms are efficient and scalable. The comparison with two popular relational RDF stores, Jena and Sesame, and two commercial native RDF stores, AllegroGraph and BigOWLIM, showed that our optimizations result in improved performance and scalability for provenance metadata management. Finally, our case study for provenance management in a real-life biological simulation workflow showed the production quality and capability of the RDFProv system. Although presented in the context of scientific workflow provenance management, many of our proposed techniques apply to general RDF data management as well.