Balancing processor loads and exploiting data locality in N-body simulations

Authors:
Ioana Banicescu;Susan Flynn Hummel
Affiliations:
Polytechnic University, Six MetroTech Center, Brooklyn, NY;Polytechnic University and IBM T. J. Watson Research Center
Venue:
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Year:
1995

Citing 11
Cited 17

A Partitioning Strategy for Nonuniform Problems on Multiprocessors

IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Factoring: a method for scheduling parallel loops

Communications of the ACM
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
Astrophysical N-body simulations using hierarchical tree data structures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A parallel hashed Oct-Tree N-body algorithm

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A parallel adaptive fast multipole method

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Sorting on a mesh-connected parallel computer

Communications of the ACM
Scalable parallel formulations of the barnes-hut method for n-body simulations

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A Parallel Version of the Fast Multipole Method-Invited Talk

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing

Load-sharing in heterogeneous systems via weighted factoring

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Efficient Representation Scheme for Multidimensional Array Operations

IEEE Transactions on Computers
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Load Balancing Highly Irregular Computations with the Adaptive Factoring

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient Data Parallel Algorithms for Multidimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Message-passing parallel adaptive quantum trajectory method

High performance scientific and engineering computing
Overhead Analysis of a Dynamic Load Balancing Library for Cluster Computing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Design and implementation of a novel dynamic load balancing library for cluster computing

Parallel Computing - Heterogeneous computing
A Load Balancing Tool for Distributed Parallel Loops

Cluster Computing
Dynamic load balancing with adaptive factoring methods in scientific applications

The Journal of Supercomputing
Performance evaluation of a dynamic load-balancing library for cluster computing

International Journal of Computational Science and Engineering
Data exploration of turbulence simulations using a database cluster

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems

Performance Evaluation
Quantifying the effectiveness of load balance algorithms

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although N-body simulation algorithms are amenable to parallelization, performance gains from execution on parallel machines are difficult to obtain due to load imbalances caused by irregular distributions of bodies. In general, there is a tension between balancing processor loads and maintaining locality, as the dynamic re-assignment of work necessitates access to remote data. Fractiling is a dynamic scheduling scheme that simultaneously balances processor loads and maintains locality by exploiting the self-similarity properties of fractals. Fractiling is based on a probabilistic analysis, and thus, accommodates load imbalances caused by predictable phenomena, such as irregular data, and unpredictable phenomena, such as data-access latencies. In experiments on a KSR1, performance of N-body simulation codes were improved by as much as 53% by fractiling. Performance improvements were obtained on uniform and nonuniform distributions of bodies, underscoring the need for a scheduling scheme that accommodates system induced variance. As the fractiling scheme is orthogonal to the N-body algorithm, we could use simple codes that discretize space into equal-size subrectangles (2-d) or subcubes (3-d) as the base algorithms.