Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Optimizing recursive queries in SQL
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A survey of data provenance in e-science
ACM SIGMOD Record
Scheduling of scientific workflows in the ASKALON grid environment
ACM SIGMOD Record
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Making database systems usable
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Tackling the Provenance Challenge one layer at a time
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
Tracking provenance in a virtual data grid
Concurrency and Computation: Practice & Experience - The First Provenance Challenge
A Logic Programming Approach to Scientific Workflow Provenance Querying
Provenance and Annotation of Data and Processes
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Exploratory Search
Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Towards a Taxonomy of Provenance in Scientific Workflow Management Systems
SERVICES '09 Proceedings of the 2009 Congress on Services - I
RDFProv: A relational RDF store for querying and managing scientific workflow provenance
Data & Knowledge Engineering
Analyzing graph databases by aggregate queries
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
ParaTrac: a fine-grained profiler for data-intensive workflows
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Layering in provenance systems
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
The Open Provenance Model core specification (v1.1)
Future Generation Computer Systems
Provenance management in Swift
Future Generation Computer Systems
Performance metrics and auditing framework for high performance computer systems
Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Communications of the ACM
Managing rapidly-evolving scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Applying the virtual data provenance model
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Swift: A language for distributed parallel scripting
Parallel Computing
Special issue for data intensive eScience
Distributed and Parallel Databases
Provenance traces of the swift parallel scripting system
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Hi-index | 0.00 |
Scientific research is increasingly assisted by computer-based experiments. Such experiments are often composed of a vast number of loosely-coupled computational tasks that are specified and automated as scientific workflows. This large scale is also characteristic of the data that flows within such "many-task" computations (MTC). Provenance information can record the behavior of such computational experiments via the lineage of process and data artifacts. However, work to date has focused on lineage data models, leaving unsolved issues of recording and querying other aspects, such as domain-specific information about the experiments, MTC behavior given by resource consumption and failure information, or the impact of environment on performance and accuracy. In this work we contribute with MTCProv, a provenance query framework for many-task scientific computing that captures the runtime execution details of MTC workflow tasks on parallel and distributed systems, in addition to standard prospective and data derivation provenance. To help users query provenance data we provide a high level interface that hides relational query complexities. We evaluate MTCProv using an application in protein science, and describe how important query patterns such as correlations between provenance, runtime data, and scientific parameters are simplified and expressed.