Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications

  • Authors:
  • Justin M. Wozniak;Timothy G. Armstrong;Ketan Maheshwari;Ewing L. Lusk;Daniel S. Katz;Michael Wilde;Ian T. Foster

  • Affiliations:
  • Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA. wozniak@mcs.anl.gov;Computer Science Department, University of Chicago, Chicago, IL USA. tga@uchicago.edu;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA. ketan@mcs.anl.gov;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA. lusk@mcs.anl.gov;Computation Institute, University of Chicago & Argonne National Laboratory, Chicago, IL USA. d.katz@ieee.org;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA. wilde@mcs.anl.gov;Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL USA. foster@mcs.anl.gov

  • Venue:
  • Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. “Many-task” programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tightly coupled parallelism at the lower level through multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and intertask data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.