Work stealing and persistence-based load balancers for iterative overdecomposed applications

Authors:
Jonathan Lifflander;Sriram Krishnamoorthy;Laxmikant V. Kale
Affiliations:
University of Illinois Urbana-Champaign, Urbana, IL, USA;Pacific Northwest National Lab, Richland, WA, USA;University of Illinois Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Year:
2012

Citing 17
Cited 5

Performance of dynamic load balancing algorithms for unstructured mesh calculations

Concurrency: Practice and Experience
CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
An improved spectral graph partitioning algorithm for mapping parallel computations

SIAM Journal on Scientific Computing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

IEEE Transactions on Parallel and Distributed Systems
Distributed Termination

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Parallel Adaptive Mesh Refinement for Large Eddy Simulation Using the Finite Element Method

PARA '98 Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems
Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Intel threading building blocks

Intel threading building blocks
A repartitioning hypergraph model for dynamic load balancing

Journal of Parallel and Distributed Computing
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Periodic hierarchical load balancing for large supercomputers

International Journal of High Performance Computing Applications

Adoption protocols for fanout-optimal fault-tolerant termination detection

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Steal Tree: low-overhead tracing of work stealing schedulers

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
A distributed dynamic load balancer for iterative applications

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for load balancing of tensor contraction expressions via dynamic task partitioning

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.