Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

  • Authors:
  • Justin M. Wozniak;Timothy G. Armstrong;Ketan Maheshwari;Ewing L. Lusk;Daniel S. Katz;Michael Wilde;Ian T. Foster

  • Affiliations:
  • Argonne National Laboratory Argonne, IL;University of Chicago, Chicago, IL;Argonne National Laboratory Argonne, IL;Argonne National Laboratory Argonne, IL;University of Chicago & Argonne National Laboratory Chicago, IL;Argonne National Laboratory Argonne, IL;Argonne National Laboratory Argonne, IL

  • Venue:
  • Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.