X-CSR: Dataflow Optimization for Distributed XML Process Pipelines

  • Authors:
  • Daniel Zinn;Shawn Bowers;Timothy McPhillips;Bertram Ludäscher

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML process networks are a simple, yet powerful programming paradigm for loosely coupled, coarse-grained dataflow applications such as data-centric scientific workflows. We describe a framework called Delta-XML that is well-suited for applications in which pipelines of data processors modify parts ("deltas'') of XML data collections while keeping the overall collection structure intact. We show how to optimize the execution of Delta-XML process networks by minimizing the data shipping cost in distributed settings. This X-CSR** optimization employs static type inference based on XML Schema to determine the XML stream fragments that are relevant to a processor, allowing irrelevant fragments to be bypassed ("shipped'') to downstream pipeline steps. Finally, we present evaluation results for a real-world scientific workflow, which shows the practical feasibility of X-CSR. A long version of this paper is available as technical report (http://www.cs.ucdavis.edu/research/tech-reports/2008/CSE-2008-15.pdf).** X-CSR: _X_ML _C_ut, _S_hip, and _R_eassemble; pronounced "X-scissor''