MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems

Authors:
Mohammad Banikazemi;Rama K. Govindaraju;Robert Blackmore;Dhabaleswar K. Panda
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM Power Parallel Systems, Poughkeepsie, NY;IBM Power Parallel Systems, Poughkeepsie, NY;Ohio State Univ., Columbus
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 13
Cited 15

PVM: a framework for parallel distributed computing

Concurrency: Practice and Experience
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
SP2 system architecture

IBM Systems Journal
The SP2 high-performance switch

IBM Systems Journal
The communication software and parallel environment of the IBM SP2

IBM Systems Journal
MPI-FM: high performance MPI on workstation clusters

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Efficient message passing interface (MPI) for parallel computing on clusters of workstations

Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Low-latency communication on the IBM RISC system/6000 SP

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Implementing Efficient MPI on LAPI for IBM RS/6000 SP Systems: Experiences and Performance Evaluation

IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Performance and Experience with LAPI -- A New High-Performance Communication Library for the IBM RS/6000 SP

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
MPI on the I-WAY: A Wide-Area, Multimethod Implementation of the Message Passing Interface

MPIDC '96 Proceedings of the Second MPI Developers Conference

High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Fast synchronization on shared-memory multiprocessors: An architectural approach

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Lazy direct-to-cache transfer during receive operations in a message passing environment

Proceedings of the 3rd conference on Computing frontiers
RDMA control support for fine-grain parallel computations

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Hiding message delivery and reducing memory access latency by providing direct-to-cache transfer during receive operations in a message passing environment

MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing: Exploring Programming Models and Their Architectural Support

IEEE Transactions on Computers
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
Hiding message delivery latency using Direct-to-Cache-Transfer techniques in message passing environments

Microprocessors & Microsystems
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
Architecture and early performance of the new IBM HPS fabric and adapter

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Comparing direct-to-cache transfer policies to TCP/IP and M-VIA during receive operations in MPI environments

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The IBM RS/6000 SP system is one of the most cost-effective commercially available high performance machines. IBM RS/6000 SP systems support the Message Passing Interface standard (MPI) and LAPI. LAPI is a low level, reliable, and efficient one-sided communication API library implemented on IBM RS/6000 SP systems. This paper explains how the high performance of the LAPI library has been exploited in order to implement the MPI standard more efficiently than the existing MPI. It describes how to avoid unnecessary data copies at both the sending and receiving sides for such an implementation. The resolution of problems arising from the mismatches between the requirements of the MPI standard and the features of LAPI is discussed. As a result of this exercise, certain enhancements to LAPI are identified to enable an efficient implementation of MPI on LAPI. The performance of the new implementation of MPI is compared with that of the underlying LAPI itself. The latency (in polling and interrupt modes) and bandwidth of our new implementation is compared with that of the native MPI implementation on RS/6000 SP systems. The results indicate that the MPI implementation on LAPI performs comparably to or better than the original MPI implementation in most cases. Improvements of up to 17.3 percent in polling mode latency, 35.8 percent in interrupt mode latency, and 20.9 percent in bandwidth are obtained for certain message sizes. The implementation of MPI on top of LAPI also outperforms the native MPI implementation for the NAS Parallel Benchmarks.