High performance RDMA-based MPI implementation over InfiniBand

Authors:
Jiuxing Liu;Jiesheng Wu;Sushmitha P. Kini;Pete Wyckoff;Dhabaleswar K. Panda
Affiliations:
The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;Ohio Supercomputer Center, Columbus, OH;The Ohio State University, Columbus, OH
Venue:
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Year:
2003

Citing 15
Cited 38

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems

IEEE Transactions on Parallel and Distributed Systems
MPI-StarT: delivering network performance to numerical applications

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Experiences with VI communication for database storage

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The Virtual Interface Architecture

IEEE Micro
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Strategy to Compute the InfiniBand Arbitration Tables

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Structure and Performance of the Direct Access File System

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Ultra-high performance communication with MPI and the Sun fire™ link interconnect

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Impact of On-Demand Connection Management in MPI over VIA

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
User-Level Communication in Cluster-Based Servers

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Runtime Compression of MPI Messanes to Improve the Performance and Scalability of Parallel Applications

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PRESS: A Clustered Server Based on User-Level Communication

IEEE Transactions on Parallel and Distributed Systems
Designing Efficient Java Communications on Clusters

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 5 - Volume 06
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation

International Journal of High Performance Computing Applications
Platform Overlays: enabling in-network stream processing in large-scale distributed applications

NOSSDAV '05 Proceedings of the international workshop on Network and operating systems support for digital audio and video
Design and Implementation of Multiple Fault-Tolerant MPI over Myrinet (M^3)

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Memory efficient parallel matrix multiplication operation for irregular problems

Proceedings of the 3rd conference on Computing frontiers
RDMA control support for fine-grain parallel computations

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
A case for high performance computing with virtual machines

Proceedings of the 20th annual international conference on Supercomputing
Scanning workstation memory for malicious codes using dedicated coprocessors

Proceedings of the 44th annual Southeast regional conference
Scalable algorithms for molecular dynamics simulations on commodity clusters

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Design issues and performance comparisons in supporting the sockets interface over user-level communication architecture

The Journal of Supercomputing
Coprocessor design to support MPI primitives in configurable multiprocessors

Integration, the VLSI Journal
Nomad: migrating OS-bypass networks in virtual machines

Proceedings of the 3rd international conference on Virtual execution environments
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
High-performance ethernet-based communications for future multi-core processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Performance implications of virtualizing multicore cluster machines

Proceedings of the 2nd workshop on System-level virtualization for high performance computing
Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocols

Proceedings of the 23rd international conference on Supercomputing
Synthesis of Communication Mechanisms for Multi-tile Systems Based on Heterogeneous Multi-processor System-On-Chips

RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Infiniband scalability in open MPI

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Unifying UPC and MPI runtimes: experience with MVAPICH

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
High performance RDMA protocols in HPC

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
SHIELD: a fault-tolerant MPI for an infiniband cluster

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Open MPI: a flexible high performance MPI

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Analysis of the memory registration process in the mellanox infiniband software stack

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Tolerating message latency through the early release of blocked receives

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Architecture and early performance of the new IBM HPS fabric and adapter

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
High-performance RMA-based broadcast on the intel SCC

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Barely alive memory servers: Keeping data active in a low-power state

ACM Journal on Emerging Technologies in Computing Systems (JETC)
RDMA in the SiCortex cluster systems

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
An efficient kernel-level blocking MPI implementation

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Revisiting rendezvous protocols in the context of RDMA-capable host channel adapters and many-core processors

Proceedings of the 20th European MPI Users' Group Meeting
Hybrid MPI: efficient message passing for multi-core systems

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second. Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%. For large messages, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks.