Composing and executing parallel data-flow graphs with shell pipes

Authors:
Edward Walker;Weijia Xu;Vinoth Chandar
Affiliations:
University of Texas at Austin, Austin, TX;University of Texas at Austin, Austin, TX;Oracle Corporation, Redwood Shores, CA
Venue:
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Year:
2009

Citing 14
Cited 5

Coordination languages and their significance

Communications of the ACM
Bounded scheduling of process networks

Bounded scheduling of process networks
Design and implementation of a parallel pipe

ACM SIGOPS Operating Systems Review
Scripting: Higher-Level Programming for the 21st Century

Computer
An Empirical Comparison of Seven Programming Languages

Computer
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
In Praise of Scripting: Real Programming Pragmatism

Computer
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Taverna, reloaded

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Swift: A language for distributed parallel scripting

Parallel Computing
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Parallelizing the execution of sequential scripts

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications

Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we extend the concept of shell pipes to incorporate forks, joins, cycles, and key-value aggregation. These extensions enable the implementation of a class of data-flow computation with strong deterministic properties, and provide a simple yet powerful coordination layer for leveraging multi-language and legacy components for large-scale parallel computation. Concretely, this paper describes the design and implementation of the language extensions in Bourne Again SHell (BASH), and examines the performance of the system using micro and macro benchmarks. The implemented system is shown to scale to thousands of processors, enabling high throughput performance for millions of processing tasks on large commodity compute clusters.