Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
A survey of data provenance in e-science
ACM SIGMOD Record
Tracking provenance in a virtual data grid
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Provenance and scientific workflows: challenges and opportunities
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Communications of the ACM
Layering in provenance systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
The Open Provenance Model core specification (v1.1)
Future Generation Computer Systems
Provenance management in Swift
Future Generation Computer Systems
Managing rapidly-evolving scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Applying the virtual data provenance model
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Swift: A language for distributed parallel scripting
Parallel Computing
Hi-index | 0.00 |
Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flow. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs of computational tasks that are part of the experiment. This analysis can be automated using provenance management systems that describe, for instance, the production and consumption relationships between data artifacts, such as files, and the computational tasks that compose the scientific application. In this article, we explore the relationship between high performance computing and provenance management systems, observing that storing provenance as structured data enriched with information about the runtime behavior of computational tasks in high performance computing environments can enable interesting and useful queries to correlate computational resource usage, scientific parameters, and data set derivation. We briefly describe how provenance of many-task scientific computations specified and coordinated by the Swift parallel scripting system is gathered and queried.