190 TFlops Astrophysical N-body Simulation on a Cluster of GPUs

Authors:
Tsuyoshi Hamada;Keigo Nitadori
Affiliations:
-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 16
Cited 9

A fast algorithm for particle simulations

Journal of Computational Physics
A modified tree code: don't laugh; it runs

Journal of Computational Physics
Astrophysical N-body simulations using hierarchical tree data structures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Astrophysical N-body simulations on GRAPE-4 special-purpose computer

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
$7.0/Mflops astrophysical N-body simulation with treecode on GRAPE-5

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
N-body simulation of galaxy formation on GRAPE-4 special-purpose computer

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A 1.349 Tflops simulation of black holes in a galactic center on GRAPE-6

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Avalon: an Alpha/Linux cluster achieves 10 Gflops for $15k

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A 29.5 Tflops simulation of planetesimals in Uranus-Neptune region on GRAPE-6

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance evaluation and tuning of GRAPE-6 - towards 40 "real" Tflops

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
$158/GFLOPS astrophysical N-body simulation with reconfigurable add-in card and hierarchical tree algorithm

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A 55 TFLOPS simulation of amyloid-forming peptides from yeast prion Sup35 with the special-purpose computer system MDGRAPE-3

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Fast multipole methods on graphics processors

Journal of Computational Physics
Speeding up N-body Calculations on Machines without Hardware Square Root

Scientific Programming
42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Study of hierarchical n-body methods for network-on-chip architectures

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Scaling fast multipole methods up to 4000 GPUs

Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Towards Symmetric Multi-threaded Optimistic Simulation Kernels

PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
GRAPE-8: an accelerator for gravitational N-body simulation with 20.5Gflops/W performance

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Load sharing for optimistic parallel simulations on multi core machines

ACM SIGMETRICS Performance Evaluation Review
A peta-scalable CPU-GPU algorithm for global atmospheric simulations

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the results of a hierarchical N-body simulation on DEGIMA, a cluster of PCs with 576 graphic processing units (GPUs) and using an InfiniBand interconnect. DEGIMA stands for DEstination for GPU Intensive MAchine, and is located at Nagasaki Advanced Computing Center (NACC), Nagasaki University. In this work, we have upgraded DEGIMA_s interconnect using InfiniBand. DEGIMA is composed by 144 nodes with 576 GT200 GPUs. An astrophysical N-body simulation with 3,278,982,596 particles using a treecode algorithm shows a sustained performance of 190.5 Tflops on DEGIMA. The overall cost of the hardware was $411,921 dollars. The maximum corrected performance is 104.8 Tflops for the simulation, resulting in a cost performance of 254.4 MFlops/$. This corrections is performed by counting the FLOPS based on the most efficient CPU algorithm. Any extra FLOPS that arise from the GPU implementation and parameter differences are not included in the 254.4 MFLOPS/$.