Scientific Workflow Management by Database Management
SSDBM '98 Proceedings of the 10th International Conference on Scientific and Statistical Database Management
Triana: A Graphical Web Service Composition and Execution Toolkit
ICWS '04 Proceedings of the IEEE International Conference on Web Services
A taxonomy of scientific workflow systems for grid computing
ACM SIGMOD Record
Taverna: lessons in creating a workflow environment for the life sciences: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
A Decentralized and Cooperative Workflow Scheduling Algorithm
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
The cost of doing science on the cloud: the Montage example
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Case study of scientific data processing on a cloud using hadoop
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Actor-oriented design of scientific workflows
ER'05 Proceedings of the 24th international conference on Conceptual Modeling
A cloud-enabled regional climate model evaluation system
Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing
Provenance for MapReduce-based data-intensive workflows
Proceedings of the 6th workshop on Workflows in support of large-scale science
Case study of scientific data processing on a cloud using hadoop
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
Riding the elephant: managing ensembles with hadoop
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
HyMR: a hybrid MapReduce workflow system
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Hi-index | 0.02 |
This paper describes CloudWF, a scalable and lightweight computational workflow system for clouds on top of Hadoop. CloudWF can run workflow jobs composed of multiple Hadoop MapReduce or legacy programs. Its novelty lies in several aspects: a simple workflow description language that encodes workflow blocks and block-to-block dependencies separately as standalone executable components; a new workflow storage method that uses Hadoop HBase sparse tables to store workflow information internally and reconstruct workflow block dependencies implicitly for efficient workflow execution; transparent file staging with Hadoop DFS; and decentralized workflow execution management relying on the MapReduce framework for task scheduling and fault tolerance. This paper describes the design and implementation of CloudWF.