Exploring provenance in high performance scientific computing

  • Authors:
  • Luiz Manoel Rocha Gadelha, Junior;Michael Wilde;Marta Mattoso;Ian Foster

  • Affiliations:
  • Federal University of Rio de Janeiro & National Laboratory for Scientific Computing, Rio de Janeiro, Brazil;Argonne National Laboratory & University of Chicago, Argonne, USA;Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Argonne National Laboratory & University of Chicago, Argonne, USA

  • Venue:
  • Proceedings of the first annual workshop on High performance computing meets databases
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flow. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs of computational tasks that are part of the experiment. This analysis can be automated using provenance management systems that describe, for instance, the production and consumption relationships between data artifacts, such as files, and the computational tasks that compose the scientific application. In this article, we explore the relationship between high performance computing and provenance management systems, observing that storing provenance as structured data enriched with information about the runtime behavior of computational tasks in high performance computing environments can enable interesting and useful queries to correlate computational resource usage, scientific parameters, and data set derivation. We briefly describe how provenance of many-task scientific computations specified and coordinated by the Swift parallel scripting system is gathered and queried.