Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application

Authors:
Sreeram Potluri;Ping Lai;Karen Tomko;Sayantan Sur;Yifeng Cui;Mahidhar Tatineni;Karl W. Schulz;William L. Barth;Amitava Majumdar;Dhabhaleswar K. Panda
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Ohio Supercomputer Center, Columbus, OH;The Ohio State University, Columbus, OH;San Diego Supercomputer Center, San Diego, California;San Diego Supercomputer Center, San Diego, California;Texas Advanced Computing Center, Austin, Texas;Texas Advanced Computing Center, Austin, Texas;San Diego Supercomputer Center, San Diego, California;The Ohio State University, Columbus, OH
Venue:
Proceedings of the 24th ACM International Conference on Supercomputing
Year:
2010

Citing 7
Cited 3

An analysis of the impact of MPI overlap and independent progress

Proceedings of the 18th annual international conference on Supercomputing
Scheduling of MPI-2 One Sided Operations over InfiniBand

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
A Scalable Implementation of a Finite-Volume Dynamical Core in the Community Atmosphere Model

International Journal of High Performance Computing Applications
High performance MPI-2 one-sided communication over InfiniBand

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Quantifying the potential benefit of overlapping communication and computation in large-scale scientific applications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Natively Supporting True One-Sided Communication in MPI on Multi-core Systems with InfiniBand

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid

Scalable Earthquake Simulation on Petascale Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Design and implementation of key proposed MPI-3 one-sided communication semantics on infiniband

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Enabling highly-scalable remote memory access programming with MPI-3 one sided

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

AWM-Olsen is a widely used ground motion simulation code based on a parallel finite difference solution of the 3-D velocity-stress wave equation. This application runs on tens of thousands of cores and consumes several million CPU hours on the TeraGrid Clusters every year. A significant portion of its run-time (37% in a 4,096 process run), is spent in MPI communication routines. Hence, it demands an optimized communication design coupled with a low-latency, high-bandwidth network and an efficient communication subsystem for good performance. In this paper, we analyze the performance bottlenecks of the application with regard to the time spent in MPI communication calls. We find that much of this time can be overlapped with computation using MPI non-blocking calls. We use both two-sided and MPI-2 one-sided communication semantics to re-design the communication in AWM-Olsen. We find that with our new design, using MPI-2 one-sided communication semantics, the entire application can be sped up by 12% at 4K processes and by 10% at 8K processes on a state-of-the-art InfiniBand cluster, Ranger at the Texas Advanced Computing Center (TACC).