A fast algorithm for particle simulations
Journal of Computational Physics
An Analysis of Scatter Decomposition
IEEE Transactions on Computers
The parallel multipole method on the connection machine
SIAM Journal on Scientific and Statistical Computing
The order of Appel's algorithm
Information Processing Letters
Mapping the adaptive fast multipole algorithm onto MIMD systems
Unstructured scientific computation on scalable multiprocessors
Astrophysical N-body simulations using hierarchical tree data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A parallel hashed Oct-Tree N-body algorithm
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
A Parallel Version of the Fast Multipole Method-Invited Talk
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Balancing processor loads and exploiting data locality in N-body simulations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Parallel matrix-vector product using approximate hierarchical methods
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel hierarchical solvers and preconditioners for boundary element methods
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A hierarchical load-balancing framework for dynamic multithreaded computations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Optimizing COOP Languages: Study of a Protein Dynamics Program
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Cost Optimal Parallel Algorithm for Computing Force Field in N-Body Simulations
COCOON '98 Proceedings of the 4th Annual International Conference on Computing and Combinatorics
Termination detection in data-driven parallel computations/applications
Journal of Parallel and Distributed Computing
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
A parallel algorithm for 3D dislocation dynamics
Journal of Computational Physics
A parallel and distributed discrete event approach for spatial cell-biological simulations
ACM SIGMETRICS Performance Evaluation Review
Parallel and Distributed Spatial Simulation of Chemical Reactions
Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation
Rapid Multipole Graph Drawing on the GPU
Graph Drawing
Scaling Hierarchical N-body Simulations on GPU Clusters
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Towards high-level grid programming and load-balancing: a Barnes-hut case study
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A massively parallel adaptive fast multipole method on heterogeneous architectures
Communications of the ACM
Scalable parallel graph partitioning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.02 |
In this paper, we present two new parallel formulations of the Barnes-Hut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation which combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. The second parallel formulation is inspired by two currently best known parallel algorithms for the Barnes-Hut method. We present an experimental evaluation of these schemes on a 256 processor nCUBE2 parallel computer for an astrophysical simulation.