Computer simulation of liquids
Computer simulation of liquids
50 GFlops molecular dynamics on the Connection Machine 5
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Message-passing multi-cell molecular dynamics on the Connection Machine 5
Parallel Computing
Avalon: an Alpha/Linux cluster achieves 10 Gflops for $15k
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scalable algorithms for molecular dynamics simulations on commodity clusters
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Petascale computing with accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
State-of-the-art in heterogeneous computing
Scientific Programming
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
Increasing the efficiency of the DaCS programming model for heterogeneous systems
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
An (almost) direct deployment of the Fast Multipole Method on the Cell processor
The Journal of Supercomputing
Hi-index | 0.00 |
We present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dualcore microprocessors and four PowerXCell 8i enhanced Cell microprocessors, so that there are four MPI ranks per node, each with one Opteron and one Cell. The interatomic forces are computed on the Cells (each with one PPU and eight SPU cores), while the Opterons are used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), corresponding to 124 MFlop/Watt/s at a price of approximately 3.69 MFlops/dollar. We demonstrate an initial target application, the jetting and ejection of material from a shocked surface.