High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis

Authors:
Sayantan Sur;Matthew J. Koop;Dhabaleswar K. Panda
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University
Venue:
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Year:
2006

Citing 12
Cited 8

NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Performance of Various Computers Using Standard Linear Equations Software

Performance of Various Computers Using Standard Linear Equations Software
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance characterization of molecular dynamics techniques for biomolecular simulations

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Infiniband scalability in open MPI

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Shared receive queue based scalable MPI design for infiniband clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive connection management for scalable MPI over InfiniBand

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)

High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
High-performance ethernet-based communications for future multi-core processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Evaluating Sparse Data Storage Techniques for MPI Groups and Communicators

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Impact of Node Level Caching in MPI Job Launch Mechanisms

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

Journal of Parallel and Distributed Computing
Investigations on InfiniBand: efficient network buffer utilization at scale

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Application note: High-performance computing for climate change impact studies with the Pasture Simulation model

Computers and Electronics in Agriculture

Quantified Score

Hi-index	0.00

Visualization

Abstract

InfiniBand is an emerging HPC interconnect being deployed in very large scale clusters, with even larger InfiniBand-based clusters expected to be deployed in the near future. The Message Passing Interface (MPI) is the programming model of choice for scientific applications running on these largescale clusters. Thus, it is very critical for the MPI implementation used to be based on a scalable and high-performance design. We analyze the performance and scalability aspects of MVAPICH, a popular open-source MPI implementation on InfiniBand, from an application standpoint. We analyze the performance and memory requirements of the MPI library while executing several well-known applications and benchmarks, such as NAS, SuperLU, NAMD, and HPL on a 64-node InfiniBand cluster. Our analysis reveals that latest design of MVAPICH requires an order of magnitude less internal MPI memory (average per process) and yet delivers the best possible performance. Further, we observe that for these benchmarks and applications evaluated, the internal memory requirement of MVAPICH remains nearly constant at around 5-10 MB as the number of processes increase, indicating that the MVAPICH design is highly scalable.