A Partitioning Strategy for Nonuniform Problems on Multiprocessors
IEEE Transactions on Computers
Dynamic load balancing for distributed memory multiprocessors
Journal of Parallel and Distributed Computing
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel adaptive fast multipole method
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallel optimisation algorithms for multilevel mesh partitioning
Parallel Computing - Special issue on graph partioning and parallel computing
A unified algorithm for load-balancing adaptive scientific simulations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Enhancing scalability of parallel structured AMR calculations
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Scalable Line Dynamics in ParaDiS
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Methods of inference and learning for performance modeling of parallel applications
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
BlueGene/L applications: Parallelism On a Massive Scale
International Journal of High Performance Computing Applications
PNMPI tools: a whole lot greater than the sum of their parts
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Dynamic topology aware load balancing algorithms for molecular dynamics applications
Proceedings of the 23rd international conference on Supercomputing
New challenges in dynamic load balancing
Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Detailed Load Balance Analysis of Large Scale Parallel Applications
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model
SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Periodic hierarchical load balancing for large supercomputers
International Journal of High Performance Computing Applications
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation
Proceedings of the 40th Annual International Symposium on Computer Architecture
Bit mapping for balanced PCM cell programming
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Load balance is critical for performance in large parallel applications. An imbalance on today's fastest supercomputers can force hundreds of thousands of cores to idle, and on future exascale machines this cost will increase by over a factor of a thousand. Improving load balance requires a detailed understanding of the amount of computational load per process and an application's simulated domain, but no existing metrics sufficiently account for both factors. Current load balance mechanisms are often integrated into applications and make implicit assumptions about the load. Some strategies place the burden of providing accurate load information, including the decision on when to balance, on the application. Existing application-independent mechanisms simply measure the application load without any knowledge of application elements, which limits them to identifying imbalance without correcting it. Our novel load model couples abstract application information with scalable measurements to derive accurate and actionable load metrics. Using these metrics, we develop a cost model for correcting load imbalance. Our model enables comparisons of the effectiveness of load balancing algorithms in any specific imbalance scenario. Our model correctly selects the algorithm that achieves the lowest runtime in up to 96% of the cases, and can achieve a 19% gain over selecting a single balancing algorithm for all cases.