A Partitioning Strategy for Nonuniform Problems on Multiprocessors
IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Factoring: a method for scheduling parallel loops
Communications of the ACM
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Astrophysical N-body simulations using hierarchical tree data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel adaptive fast multipole method
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Sorting on a mesh-connected parallel computer
Communications of the ACM
Scalable parallel formulations of the barnes-hut method for n-body simulations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Parallel Version of the Fast Multipole Method-Invited Talk
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Load-sharing in heterogeneous systems via weighted factoring
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
Load Balancing Highly Irregular Computations with the Adaptive Factoring
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
IEEE Transactions on Parallel and Distributed Systems
Message-passing parallel adaptive quantum trajectory method
High performance scientific and engineering computing
Overhead Analysis of a Dynamic Load Balancing Library for Cluster Computing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Design and implementation of a novel dynamic load balancing library for cluster computing
Parallel Computing - Heterogeneous computing
A Load Balancing Tool for Distributed Parallel Loops
Cluster Computing
Dynamic load balancing with adaptive factoring methods in scientific applications
The Journal of Supercomputing
Performance evaluation of a dynamic load-balancing library for cluster computing
International Journal of Computational Science and Engineering
Data exploration of turbulence simulations using a database cluster
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Quantifying the effectiveness of load balance algorithms
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.