Achieving reproducibility by combining provenance with service and workflow versioning

  • Authors:
  • Simon Woodman;Hugo Hiden;Paul Watson;Paolo Missier

  • Affiliations:
  • Newcastle University, Newcastle upon Tyne, United Kingdom;Newcastle University, Newcastle upon Tyne, United Kingdom;Newcastle University, Newcastle upon Tyne, United Kingdom;Newcastle University, Newcastle upon Tyne, United Kingdom

  • Venue:
  • Proceedings of the 6th workshop on Workflows in support of large-scale science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Capturing and exploiting provenance information is considered to be important across a range of scientific, medical, commercial and Web applications, including recent trends towards publishing provenance-rich, executable papers. This article shows how the range of useful questions that provenance can answer is greatly increased when it is encapsulated into a system that can store and execute both current and old versions of workflows and services. e- Science Central provides a scalable, secure cloud platform for application developers. They can use it to upload data -- for storage on the cloud -- and services, which can be written in a variety of languages. These services can then be combined through workflows which are enacted in the cloud to compute over the data. When a workflow runs, a complete provenance trace is recorded. This paper shows how this provenance trace, used in conjunction with the ability to execute old versions of services and workflows (rather than just the latest versions) can provide useful information that would otherwise not be possible, including the key ability to reproduce experiments and to compare the effects of old and new versions of services on computations.