Ontology-Driven Provenance Management in eScience: An Application in Parasite Research

Authors:
Satya S. Sahoo;D. Brent Weatherly;Raghava Mutharaju;Pramod Anantharam;Amit Sheth;Rick L. Tarleton
Affiliations:
Kno.e.sis Center., Computer Science amd Engineering Department, Wright State University, Dayton, USA 45435;Tarleton Research Group, CTEGD, Univeristy of Georgia, Athens, USA 30602;Kno.e.sis Center., Computer Science amd Engineering Department, Wright State University, Dayton, USA 45435;Kno.e.sis Center., Computer Science amd Engineering Department, Wright State University, Dayton, USA 45435;Kno.e.sis Center., Computer Science amd Engineering Department, Wright State University, Dayton, USA 45435;Tarleton Research Group, CTEGD, Univeristy of Georgia, Athens, USA 30602
Venue:
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Year:
2009

Citing 7
Cited 1

Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
The complexity of relational query languages (Extended Abstract)

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
An efficient SQL-based RDF querying scheme

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A survey of data provenance in e-science

ACM SIGMOD Record
Knowledge modeling and its application in life sciences: a tale of two ontologies

Proceedings of the 15th international conference on World Wide Web
Semantic Provenance for eScience: Managing the Deluge of Scientific Data

IEEE Internet Computing
Semantics and complexity of SPARQL

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Role of semantic web in health informatics

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Provenance, from the French word "provenir ", describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part of the Semantic Problem Solving Environment (SPSE) for Trypanosoma cruzi (T.cruzi ). This provenance infrastructure, called T.cruzi Provenance Management System (PMS), is underpinned by (a) a domain-specific provenance ontology called Parasite Experiment ontology, (b) specialized query operators for provenance analysis, and (c) a provenance query engine. The query engine uses a novel optimization technique based on materialized views called materialized provenance views (MPV) to scale with increasing data size and query complexity. This comprehensive ontology-driven provenance infrastructure not only allows effective tracking and management of ongoing experiments in the Tarleton Research Group at the Center for Tropical and Emerging Global Diseases (CTEGD), but also enables researchers to retrieve the complete provenance information of scientific results for publication in literature.