Parallel gravity: from embarrassingly parallel to hierarchical
Proceedings of the 2012 workshop on High-Performance Computing for Astronomy Date
Scaling fast multipole methods up to 4000 GPUs
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
GRAPE-8: an accelerator for gravitational N-body simulation with 20.5Gflops/W performance
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
2HOT: an improved parallel hashed oct-tree n-body algorithm for cosmological simulation
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A hybrid parallel barnes-hut algorithm for GPU and multicore architectures
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 31.45 |
We present the implementation and performance of a new gravitational N-body tree-code that is specifically designed for the graphics processing unit (GPU). All parts of the tree-code algorithm are executed on the GPU. We present algorithms for parallel construction and traversing of sparse octrees. These algorithms are implemented in CUDA and tested on NVIDIA GPUs, but they are portable to OpenCL and can easily be used on many-core devices from other manufacturers. This portability is achieved by using general parallel-scan and sort methods. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.