Exploring provenance in high performance scientific computing

Authors:
Luiz Manoel Rocha Gadelha, Junior;Michael Wilde;Marta Mattoso;Ian Foster
Affiliations:
Federal University of Rio de Janeiro & National Laboratory for Scientific Computing, Rio de Janeiro, Brazil;Argonne National Laboratory & University of Chicago, Argonne, USA;Federal University of Rio de Janeiro, Rio de Janeiro, Brazil;Argonne National Laboratory & University of Chicago, Argonne, USA
Venue:
Proceedings of the first annual workshop on High performance computing meets databases
Year:
2011

Citing 14
Cited 0

Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
A survey of data provenance in e-science

ACM SIGMOD Record
Tracking provenance in a virtual data grid

Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Workflows and e-Science: An overview of workflow system features and capabilities

Future Generation Computer Systems
Parallel Scripting for Applications at the Petascale and Beyond

Computer
A view of cloud computing

Communications of the ACM
Layering in provenance systems

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
The Open Provenance Model core specification (v1.1)

Future Generation Computer Systems
Provenance management in Swift

Future Generation Computer Systems
Managing rapidly-evolving scientific workflows

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Applying the virtual data provenance model

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Swift: A language for distributed parallel scripting

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flow. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs of computational tasks that are part of the experiment. This analysis can be automated using provenance management systems that describe, for instance, the production and consumption relationships between data artifacts, such as files, and the computational tasks that compose the scientific application. In this article, we explore the relationship between high performance computing and provenance management systems, observing that storing provenance as structured data enriched with information about the runtime behavior of computational tasks in high performance computing environments can enable interesting and useful queries to correlate computational resource usage, scientific parameters, and data set derivation. We briefly describe how provenance of many-task scientific computations specified and coordinated by the Swift parallel scripting system is gathered and queried.