Work stealing and persistence-based load balancers for iterative overdecomposed applications

  • Authors:
  • Jonathan Lifflander;Sriram Krishnamoorthy;Laxmikant V. Kale

  • Affiliations:
  • University of Illinois Urbana-Champaign, Urbana, IL, USA;Pacific Northwest National Lab, Richland, WA, USA;University of Illinois Urbana-Champaign, Urbana, IL, USA

  • Venue:
  • Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.