PVM: a framework for parallel distributed computing
Concurrency: Practice and Experience
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
IBM Systems Journal
The SP2 high-performance switch
IBM Systems Journal
The communication software and parallel environment of the IBM SP2
IBM Systems Journal
MPI-FM: high performance MPI on workstation clusters
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Efficient message passing interface (MPI) for parallel computing on clusters of workstations
Journal of Parallel and Distributed Computing - Special issue on workstation clusters and network-based computing
Low-latency communication on the IBM RISC system/6000 SP
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
MPI on the I-WAY: A Wide-Area, Multimethod Implementation of the Message Passing Interface
MPIDC '96 Proceedings of the Second MPI Developers Conference
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Fast synchronization on shared-memory multiprocessors: An architectural approach
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
RDMA control support for fine-grain parallel computations
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 22nd annual international conference on Supercomputing
Microprocessors & Microsystems
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Architecture and early performance of the new IBM HPS fabric and adapter
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
The IBM RS/6000 SP system is one of the most cost-effective commercially available high performance machines. IBM RS/6000 SP systems support the Message Passing Interface standard (MPI) and LAPI. LAPI is a low level, reliable, and efficient one-sided communication API library implemented on IBM RS/6000 SP systems. This paper explains how the high performance of the LAPI library has been exploited in order to implement the MPI standard more efficiently than the existing MPI. It describes how to avoid unnecessary data copies at both the sending and receiving sides for such an implementation. The resolution of problems arising from the mismatches between the requirements of the MPI standard and the features of LAPI is discussed. As a result of this exercise, certain enhancements to LAPI are identified to enable an efficient implementation of MPI on LAPI. The performance of the new implementation of MPI is compared with that of the underlying LAPI itself. The latency (in polling and interrupt modes) and bandwidth of our new implementation is compared with that of the native MPI implementation on RS/6000 SP systems. The results indicate that the MPI implementation on LAPI performs comparably to or better than the original MPI implementation in most cases. Improvements of up to 17.3 percent in polling mode latency, 35.8 percent in interrupt mode latency, and 20.9 percent in bandwidth are obtained for certain message sizes. The implementation of MPI on top of LAPI also outperforms the native MPI implementation for the NAS Parallel Benchmarks.