Actors: a model of concurrent computation in distributed systems
Actors: a model of concurrent computation in distributed systems
Astrophysical N-body simulations using hierarchical tree data structures
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
$7.0/Mflops astrophysical N-body simulation with treecode on GRAPE-5
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Scalable parallel formulations of the barnes-hut method for n-body simulations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Performance evaluation and tuning of GRAPE-6 - towards 40 "real" Tflops
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A massively parallel adaptive fast-multipole method on heterogeneous architectures
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable cosmological simulations on parallel machines
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Journal of Parallel and Distributed Computing
CPU/GPU computing for long-wave radiation physics on large GPU clusters
Computers & Geosciences
Study of hierarchical n-body methods for network-on-chip architectures
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems
Proceedings of the 26th ACM international conference on Supercomputing
Scaling fast multipole methods up to 4000 GPUs
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Scaling large-data computations on multi-GPU accelerators
Proceedings of the 27th international ACM conference on International conference on supercomputing
G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems
Proceedings of the 27th international ACM conference on International conference on supercomputing
A hybrid parallel barnes-hut algorithm for GPU and multicore architectures
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
This paper focuses on the use of GPGPU-based clusters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel organization and efficiency, the balance between tree traversal and force computation work, grain size selection through the tuning of offloaded work request sizes, and the reduction of sequential bottlenecks. The effects of various application parameters are studied and experiments done to quantify gains in performance. Our studies are carried out in the context of a production-quality parallel cosmological simulator called ChaNGa. We highlight the re-engineering of the application to make it more suitable for GPU-based environments. Finally, we present performance results from experiments on the NCSA Lincoln GPU cluster, including a note on GPU use in multistepped simulations.