Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Authors:
Jiuxing Liu;Balasubramanian Chandrasekaran;Jiesheng Wu;Weihang Jiang;Sushmitha Kini;Weikuan Yu;Darius Buntinas;Peter Wyckoff;D K. Panda
Affiliations:
The Ohio State University, Columbus;The Ohio State University, Columbus;The Ohio State University, Columbus;The Ohio State University, Columbus;The Ohio State University, Columbus;The Ohio State University, Columbus;The Ohio State University, Columbus;Ohio Supercomputer Center, Columbus, OH;The Ohio State University, Columbus
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 14
Cited 38

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Architectural and performance evaluation of GigaNet and Myrinet interconnects on clusters of small-scale SMP servers

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Impact of On-Demand Connection Management in MPI over VIA

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
An Evaluation of Current High-Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

An analysis of the impact of MPI overlap and independent progress

Proceedings of the 18th annual international conference on Supercomputing
Evaluating InfiniBand Performance with PCI Express

IEEE Micro
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
Performance Evaluation of Deterministic Routings, Multicasts, and Topologies on RHiNET-2 Cluster

IEEE Transactions on Parallel and Distributed Systems
QsNetII: Defining High-Performance Network Design

IEEE Micro
An Application-Based Performance Characterization of the Columbia Supercluster

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Studying the performance of overlapping communication and computation by active message: INUKTITUT case

PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Coprocessor design to support MPI primitives in configurable multiprocessors

Integration, the VLSI Journal
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
Performance evaluation on low-latency Communication mechanism of DIMMnet-2

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Benchmarking the Columbia Supercluster

International Journal of High Performance Computing Applications
Performance evaluation of the Sun Fire Link SMP clusters

International Journal of High Performance Computing and Networking
Performance evaluation for neutron transport application using message passing

International Journal of High Performance Computing and Networking
Overcoming the processor communication overhead in MPI applications

SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
Performance implications of virtualizing multicore cluster machines

Proceedings of the 2nd workshop on System-level virtualization for high performance computing
Packet prediction for speculative cut-through switching

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Design optimization of a highly parallel InfiniBand host channel adapter

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Evaluating high performance communication: a power perspective

Proceedings of the 23rd international conference on Supercomputing
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects

International Journal of Parallel Programming
The Importance of Non-Data-Communication Overheads in MPI

International Journal of High Performance Computing Applications
Ensemble routing for datacenter networks

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Motivating future interconnects: a differential measurement analysis of PCI latency

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Benefits of high speed interconnects to cluster file systems: a case study with lustre

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A preliminary analysis of the infinipath and XD1 network interfaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance evaluation of supercomputers using HPCC and IMB benchmarks

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance enhancement of SMP clusters with multiple network interfaces using virtualization

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Challenges and issues in benchmarking MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Prediction of communication latency over complex network behaviors on SMP clusters

EPEW'05/WS-FM'05 Proceedings of the 2005 international conference on European Performance Engineering, and Web Services and Formal Methods, international conference on Formal Techniques for Computer Systems and Business Processes
WMTools - assessing parallel application memory utilisation at scale

EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Can PDES scale in environments with heterogeneous delays?

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
Characterization and modeling of PIDX parallel I/O for performance optimization

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Frontiers of Computer Science: Selected Publications from Chinese Universities
Performance modelling of parallel BLAST using Intel and PGI compilers on an infiniband-based HPC cluster

International Journal of Bioinformatics Research and Applications
A pilot study: design patterns in parallel program development

SE-HPCCSE '13 Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a comprehensive performance comparison of MPI implementations over Infini-Band, Myrinet and Quadrics. Our performance evaluation consists of two major parts. The first part consists of a set of MPI level micro-benchmarks that characterize different aspects of MPI implementations. The second part of the performance evaluation consists of application level benchmarks. We have used the NAS Parallel Benchmarks and the sweep3D benchmark. We not only present the overall performance results, but also relate application communication characteristics to the information we acquired from the micro-benchmarks. Our results show that the three MPI implementations all have their advantages and disadvantages. For our 8-node cluster, InfiniBand can offer significant performance improvements for a number of applications compared with Myrinet and Quadrics when using the PCI-X bus. Even with just the PCI bus, InfiniBand can still perform better if the applications are bandwidth-bound.