A dynamic scheduling strategy for the Chare-Kernel system
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A multi-level load balancing scheme for OR-parallel exhaustive search programs on the multi-PSI
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
A semi distributed task allocation strategy for large hypercube supercomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
ATLAS: an infrastructure for global computing
EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Diffusive Load-Balancing Policies for Dynamic Applications
IEEE Concurrency
Strategies for Dynamic Load Balancing on Highly Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Load Balancing in Parallel Molecular Dynamics
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Achieving high performance on extremely large parallel machines: performance prediction and load balancing
ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications
Engineering with Computers
Parallel adaptive simulations of dynamic fracture events
Engineering with Computers
Dynamic topology aware load balancing algorithms for molecular dynamics applications
Proceedings of the 23rd international conference on Supercomputing
New challenges in dynamic load balancing
Applied Numerical Mathematics - Adaptive methods for partial differential equations and large-scale computation
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimizing communication for Charm++ applications by reducing network contention
Concurrency and Computation: Practice & Experience - Euro-Par 2009
A load balancing strategy for prioritized execution of tasks
IPPS '93 Proceedings of the 1993 Seventh International Parallel Processing Symposium
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Quantifying the effectiveness of load balance algorithms
Proceedings of the 26th ACM international conference on Supercomputing
Dynamic threshold for imbalance assessment on load balancing for multicore systems
Computers and Electrical Engineering
A 'cool' way of improving the reliability of HPC machines
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A distributed dynamic load balancer for iterative applications
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Load balance for semantic cluster-based data integration systems
Proceedings of the 17th International Database Engineering & Applications Symposium
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines
Future Generation Computer Systems
Hi-index | 0.00 |
Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.