An O(NlogN) hypercube N-body integrator

Authors:
M. Warren;J. Salmon
Affiliations:
California Institute of Technology, Pasadena, CA;California Institute of Technology, Pasadena, CA
Venue:
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Year:
1989

Citing 5
Cited 2

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
A fast algorithm for particle simulations

Journal of Computational Physics
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
A 3-dimensional representation for fast rendering of complex scenes

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques

What have we learnt from using real parallel machines to solve real problems?

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
The design and evaluation of a shared object system for distributed memory machines

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The gravitational N-body algorithm of Barnes and Hut [1] has been successfully implemented on a hypercube concurrent processor. The novel approach of their sequential algorithm has demonstrated itself to be well suited to hypercube architectures. The sequential code achieves O (NlogN) speed by recursively dividing space into subcells, thereby creating a hierarchical grouping of particles. Computing interactions between these groups dramatically reduces the amount of communication between processors, as well as the number of force calculations. Parallelism is achieved through an irregular spatial grid decomposition. Since the decomposition topology is not simple, a general loosely synchronous communication routine has been developed. Operations are simplified if the conventional grey code decomposition is modified so that the bits are taken alternately from each Cartesian dimension. A speedup of 180 has been achieved for a 500,000 particle two-dimensional calculation on 256 processors. A speedup of 65 has been obtained for a 64,000 particle three-dimensional calculation on 256 processors.