A fast algorithm for particle simulations
Journal of Computational Physics
Factoring: a method for scheduling parallel loops
Communications of the ACM
Parallel hierarchical N-body methods and their implications for multiprocessors
Parallel hierarchical N-body methods and their implications for multiprocessors
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Load-sharing in heterogeneous systems via weighted factoring
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Mobile object layer: a runtime substrate for parallel adaptive and irregular computations
Advances in Engineering Software - Special issue on large-scale analysis, design and intelligent synthesis environments
S-HARP: a scalable parallel dynamic partitioner for adaptive mesh-based computations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Tulip: A Portable Run-Time System for Object-Parallel Systems
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Load Balancing Highly Irregular Computations with the Adaptive Factoring
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Converse: An Interoperable Framework for Parallel Programming
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Data Movement and Control Substrate for Parallel Scientific Computing
CANPC '97 Proceedings of the First International Workshop on Communication and Architectural Support for Network-Based Parallel Computing
Parallel Adaptive Quantum Trajectory Method for Wavepacket Simulations
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A Load Balancing Framework for Adaptive and Asynchronous Applications
IEEE Transactions on Parallel and Distributed Systems
Message-passing parallel adaptive quantum trajectory method
High performance scientific and engineering computing
A Novel Dynamic Load Balancing Library for Cluster Computing
ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
Runtime support for load balancing of parallel adaptive and irregular applications
Runtime support for load balancing of parallel adaptive and irregular applications
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A performance-based parallel loop scheduling on grid environments
The Journal of Supercomputing
Performance-based workload distribution on grid environments
GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Using analytical models to load balancing in a heterogeneous network of computers
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Hi-index | 0.00 |
This paper investigates the overhead of a dynamic load balancing library for large irregular data-parallel scientific applications on general-purpose clusters. The library is based on an integrated approach combining the advantages of novel dynamic loop scheduling strategies as data migration policies with the advances in resource management and task migration capabilities offered by a recently developed parallel runtime system. The paper focuses on the contribution of the runtime system software layer to the total overhead of the library. Experiments to compare the performance of two applications using the library, the Nbody simulations and the profiling of a quadrature routine, with the performance of the same applications using an MPI-only implementation of the dynamic scheduling techniques indicate only a slight decrease in performance due to the overhead of the runtime system software layer. The results validate the suitability of the runtime system as an implementation platform for dynamic load balancing schemes, and underscore the significance of using the integrated approach, as well as the benefits of using the library especially in cluster applications characterized by irregular and unpredictable behavior.