A system for managing data provenance in in silico experiments

  • Authors:
  • Jarrod Trevathan;Ian M. Atkinson;Wayne W. Read;Nigel Sim;Chris Christensen

  • Affiliations:
  • James Cook University, Douglas, Townsville, Queensland, Australia;James Cook University, Douglas, Townsville, Queensland, Australia;James Cook University, Douglas, Townsville, Queensland, Australia;James Cook University, Douglas, Townsville, Queensland, Australia;James Cook University, Douglas, Townsville, Queensland, Australia

  • Venue:
  • ADC '11 Proceedings of the Twenty-Second Australasian Database Conference - Volume 115
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In silico experiments use computers or computer simulation to speed up the rate at which scientific discoveries are made. However, the voluminous amounts of data generated in such experiments is often recorded in an ad hoc manner without regard to workflow, and often lacks rigorous business rules. The absence of stringent auditing and reporting policies makes it difficult to repeat experiments and largely denies independent parties the ability to verify study results. This paper presents a data provenance management system based on the utility of the ICAT metadata storage service as a viable schema for representing in silico experiments. The system provides a portal interface to integrate ICAT with job execution. We have built on a data repository which can handle arbitrary data size, complexity and type. This can be practically used to compare, validate and aid in the repetition of historic experiments. Furthermore, data can be verified via external repositories/sources which will ultimately enhance the scientific merit of in silico experimentation. Our proposed system augments existing applications and therefore does not require users to modify their current experimentation platform. A test case for a pharmacological study is presented to illustrate the proposed system's versatility for reporting and auditing of experiments and their results.