Journal of Computational Physics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Journal of Computational Physics
Scalable Graph Exploration on Multicore Processors
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Algorithms and data structures for massively parallel generic adaptive finite element codes
ACM Transactions on Mathematical Software (TOMS)
The Journal of Supercomputing
Extreme-scale UQ for Bayesian inverse problems governed by PDEs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Micro-applications for communication data access patterns and MPI datatypes
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Journal of Computational Physics
MPI datatype processing using runtime compilation
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.01 |
SPECFEM3D_GLOBE is a spectral-element application enabling the simulation of global seismic wave propagation in 3D anelastic, anisotropic, rotating and self-gravitating Earth models at unprecedented resolution. A fundamental challenge in global seismology is to model the propagation of waves with periods between 1 and 2 seconds, the highest frequency signals that can propagate clear across the Earth. These waves help reveal the 3D structure of the Earth's deep interior and can be compared to seismographic recordings. We broke the 2 second barrier using the 62K processor Ranger system at TACC. Indeed we broke the barrier using just half of Ranger, by reaching a period of 1.84 seconds with sustained 28.7 Tflops on 32K processors. We obtained similar results on the XT4 Franklin system at NERSC and the XT4 Kraken system at University of Tennessee Knoxville, while a similar run on the 28K processor Jaguar system at ORNL, which has better memory bandwidth per processor, sustained 35.7 Tflops (a higher flops rate) with a 1.94 shortest period. Thus we have enabled a powerful new tool for seismic wave simulation, one that operates in the same frequency regimes as nature; in seismology there is no need to pursue periods much smaller because higher frequency signals do not propagate across the entire globe. We employed performance modeling methods to identify performance bottlenecks and worked through issues of parallel I/O and scalability. Improved mesh design and numbering results in excellent load balancing and few cache misses. The primary achievements are not just the scalability and high teraflops number, but a historic step towards understanding the physics and chemistry of the Earth's interior at unprecedented resolution.