Composing and executing parallel data-flow graphs with shell pipes

  • Authors:
  • Edward Walker;Weijia Xu;Vinoth Chandar

  • Affiliations:
  • University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX;Oracle Corporation, Redwood Shores, CA

  • Venue:
  • Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we extend the concept of shell pipes to incorporate forks, joins, cycles, and key-value aggregation. These extensions enable the implementation of a class of data-flow computation with strong deterministic properties, and provide a simple yet powerful coordination layer for leveraging multi-language and legacy components for large-scale parallel computation. Concretely, this paper describes the design and implementation of the language extensions in Bourne Again SHell (BASH), and examines the performance of the system using micro and macro benchmarks. The implemented system is shown to scale to thousands of processors, enabling high throughput performance for millions of processing tasks on large commodity compute clusters.