Quantifying the effectiveness of load balance algorithms

Authors:
Olga Pearce;Todd Gamblin;Bronis R. de Supinski;Martin Schulz;Nancy M. Amato
Affiliations:
Texas A&M University, College Station, TX, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Lawrence Livermore National Laboratory, Livermore, CA, USA;Texas A&M University, College Station, TX, USA
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 22
Cited 2

A Partitioning Strategy for Nonuniform Problems on Multiprocessors

IEEE Transactions on Computers
Dynamic load balancing for distributed memory multiprocessors

Journal of Parallel and Distributed Computing
A parallel hashed Oct-Tree N-body algorithm

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel adaptive fast multipole method

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Balancing processor loads and exploiting data locality in N-body simulations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallel optimisation algorithms for multilevel mesh partitioning

Parallel Computing - Special issue on graph partioning and parallel computing
A unified algorithm for load-balancing adaptive scientific simulations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Enhancing scalability of parallel structured AMR calculations

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Scalable Line Dynamics in ParaDiS

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Methods of inference and learning for performance modeling of parallel applications

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
BlueGene/L applications: Parallelism On a Massive Scale

International Journal of High Performance Computing Applications
PNMPI tools: a whole lot greater than the sum of their parts

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable load-balance measurement for SPMD codes

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Dynamic topology aware load balancing algorithms for molecular dynamics applications

Proceedings of the 23rd international conference on Supercomputing
New challenges in dynamic load balancing

Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Detailed Load Balance Analysis of Large Scale Parallel Applications

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model

SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing
Scalable fine-grained call path tracing

Proceedings of the international conference on Supercomputing
Periodic hierarchical load balancing for large supercomputers

International Journal of High Performance Computing Applications

Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation

Proceedings of the 40th Annual International Symposium on Computer Architecture
Bit mapping for balanced PCM cell programming

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Load balance is critical for performance in large parallel applications. An imbalance on today's fastest supercomputers can force hundreds of thousands of cores to idle, and on future exascale machines this cost will increase by over a factor of a thousand. Improving load balance requires a detailed understanding of the amount of computational load per process and an application's simulated domain, but no existing metrics sufficiently account for both factors. Current load balance mechanisms are often integrated into applications and make implicit assumptions about the load. Some strategies place the burden of providing accurate load information, including the decision on when to balance, on the application. Existing application-independent mechanisms simply measure the application load without any knowledge of application elements, which limits them to identifying imbalance without correcting it. Our novel load model couples abstract application information with scalable measurements to derive accurate and actionable load metrics. Using these metrics, we develop a cost model for correcting load imbalance. Our model enables comparisons of the effectiveness of load balancing algorithms in any specific imbalance scenario. Our model correctly selects the algorithm that achieves the lowest runtime in up to 96% of the cases, and can achieve a 19% gain over selecting a single balancing algorithm for all cases.