A fast algorithm for particle simulations
Journal of Computational Physics
A modified tree code: don't laugh; it runs
Journal of Computational Physics
Parallel hierarchical N-body methods
Parallel hierarchical N-body methods
Astrophysical N-body simulations using hierarchical tree data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
An atomic model for message-passing
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Parallel hierarchical N-body methods and their implications for multiprocessors
Parallel hierarchical N-body methods and their implications for multiprocessors
Fast algorithms for N-body simulations
Fast algorithms for N-body simulations
Towards efficiency and portability: programming with the BSP model
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Rapid simulation of wireless systems
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
A data-parallel implementation of O(N) hierarchical N-body methods
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Highly portable and efficient implementations of parallel adaptive N-body methods
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A comparison of three programming models for adaptive applications on the origin2000
Journal of Parallel and Distributed Computing
A Framework for Parallel Tree-Based Scientific Simulations
ICPP '97 Proceedings of the international Conference on Parallel Processing
Radio-wave propagation prediction using ray-tracing techniques on a network of workstations (NOW)
Journal of Parallel and Distributed Computing
Parallel simulation of group behaviors
WSC '04 Proceedings of the 36th conference on Winter simulation
Hi-index | 0.00 |
This paper describes our experiences developing high-performance code for astrophysical N-body simulations. Recent N-body methods are based on an adaptive tree structure. The tree must be built and maintained across physically distributed memory; moreover, the communication requirements are irregular and adaptive. Together with the need to balance the computational work-load among processors, these issues pose interesting challenges and tradeoffs for high-performance implementation.Our implementation was guided by the need to keep solutions simple and general. We use a technique for implicitly representing a dynamic global tree across multiple processors which substantially reduces the programming complexity as well as the performance overheads of distributed memory architectures. The contributions include methods to vectorize the computation and minimize communication time which are theoretically and experimentally justified.The code has been tested by varying the number and distribution of bodies on different configurations of the Connection Machine CM-5. The overall performance on instances with 10 million bodies is typically over 30% of the peak machine rate. Preliminary timings compare favorably with other approaches.