Enabling provenance on large scale e-science applications

  • Authors:
  • Miguel Branco;Luc Moreau

  • Affiliations:
  • CERN, European Organization for, Nuclear Research, Genève;University of Southampton, Southampton, United Kingdom

  • Venue:
  • IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages. Complex software applications, such as the ATLAS High Energy Physics experiment at CERN, run throughout Grid computing sites around the world in a distributed environment, with scientists performing concurrent analysis on data and producing new data products shared among the collaboration. In this paper, we introduce a multi-phase infrastructure to achieve data provenance for an e-Science experiment. We propose an infrastructure to integrate provenance onto an existing legacy application with strong emphasis on scalability and explore the relationship between provenance and metadata introducing a model where data provenance is made available as metadata through a separate reasoning phase.