On Plug-ins and Extensible Architectures
Queue - Patching and Deployment
VisTrails: visualization meets data management
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Provenance Services for Distributed Workflows
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Provenance for Computational Tasks: A Survey
Computing in Science and Engineering
MapReduce: a flexible data processing tool
Communications of the ACM - Amir Pnueli: Ahead of His Time
Pipeline-centric provenance model
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Scientific workflows and clouds
Crossroads - Plugging Into the Cloud
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Improving workflow fault tolerance through provenance-based recovery
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Scheduling Scientific Workflows Elastically for Cloud Computing
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
Many task computing for orthologous genes identification in protozoan genomes using Hydra
Concurrency and Computation: Practice & Experience
Supporting dynamic parameter sweep in adaptive and user-steered workflow
Proceedings of the 6th workshop on Workflows in support of large-scale science
Performance evaluation of the karma provenance framework for scientific workflows
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology
An adaptive parallel execution strategy for cloud-based scientific workflows
Concurrency and Computation: Practice & Experience
A Provenance-based Adaptive Scheduling Heuristic for Parallel Scientific Workflows in Clouds
Journal of Grid Computing
Enabling re-executions of parallel scientific workflows using runtime provenance data
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Dimensioning the virtual cluster for parallel scientific workflows in clouds
Proceedings of the 4th ACM workshop on Scientific cloud computing
User-steering of HPC workflows: state-of-the-art and future directions
Proceedings of the 2nd ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Runtime Dynamic Structural Changes of Scientific Workflows in Clouds
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data.