Implementing interoperable provenance in biomedical research

Authors:
V. Curcin;S. Miles;R. Danger;Y. Chen;R. Bache;A. Taweel
Affiliations:
-;-;-;-;-;-
Venue:
Future Generation Computer Systems
Year:
2014

Citing 8
Cited 0

A graphical query language supporting recursion

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Expressing structural hypertext queries in graphlog

HYPERTEXT '89 Proceedings of the second annual ACM conference on Hypertext
A Graph-Oriented Object Database Model

IEEE Transactions on Knowledge and Data Engineering
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
GraphDB: Modeling and Querying Graphs in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Declarative specification of Web sites with S

The VLDB Journal — The International Journal on Very Large Data Bases
UnQL: a query language and algebra for semistructured data based on structural recursion

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The provenance of a piece of data refers to knowledge about its origin, in terms of the entities and actors involved in its creation, e.g. data sources used, operations carried out on them, and users enacting those operations. Provenance is used to better understand the data and the context of its production, and to assess its reliability, by asserting whether correct procedures were followed. Providing evidence for validating research is of particular importance in the biomedical domain, where the strength of the results depends on the data sources and processes used. In recent times, previously manual processes have become fully or semi-automated, e.g. clinical trial recruitment, epidemiological studies, diagnosis making. The latter is typically achieved through interactions of heterogeneous software systems in multiple settings (hospitals, clinics, academic and industrial research organisations). Provenance traces of these software need to be integrated in a consistent and meaningful manner, but since these software systems rarely share a common platform, the provenance interoperability between them has to be achieved on the level of conceptual models. It is a non-trivial matter to determine where to start in making a biomedical software system provenance-aware. In this paper, we specify recommendations to developers on how to approach provenance modelling, capture, security, storage and querying, based on our experiences with two large-scale biomedical research projects: Translational Research and Patient Safety in Europe (TRANSFoRm) and Electronic Health Records for Clinical Research (EHR4CR). While illustrated with concrete issues encountered, the recommendations are of a sufficiently high level so as to be reusable across the biomedical domain.