Scientific Workflows: Business as Usual?
BPM '09 Proceedings of the 7th International Conference on Business Process Management
Scientific workflow design with data assembly lines
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Parallelizing XML data-streaming workflows via MapReduce
Journal of Computer and System Sciences
Hi-index | 0.00 |
XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called Delta-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas'') of XML data collections while keeping the overall collection structure intact. We show how to optimize the execution of Delta-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR** optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped'') to downstream pipeline steps. Finally, we present evaluation results for a real-world scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as technical report (http://www.cs.ucdavis.edu/research/tech-reports/2008/CSE-2008-15.pdf).** X-CSR: _X_ML _C_ut, _S_hip, and _R_eassemble; pronounced "X-scissor''