Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems
IEEE Transactions on Parallel and Distributed Systems
MPI-StarT: delivering network performance to numerical applications
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Experiences with VI communication for database storage
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Virtual-Memory-Mapped Network Interfaces
IEEE Micro
The Virtual Interface Architecture
IEEE Micro
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Strategy to Compute the InfiniBand Arbitration Tables
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Structure and Performance of the Direct Access File System
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Ultra-high performance communication with MPI and the Sun fire™ link interconnect
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Impact of On-Demand Connection Management in MPI over VIA
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
User-Level Communication in Cluster-Based Servers
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
MPICH on the T3D: A Case Study of High-Performance Message Passing
MPIDC '96 Proceedings of the Second MPI Developers Conference
Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Performance evaluation of adaptive MPI
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel Languages and Compilers: Perspective From the Titanium Experience
International Journal of High Performance Computing Applications
Implementation and performance analysis of non-blocking collective operations for MPI
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Global Induction of Decision Trees: From Parallel Implementation to Distributed Evolution
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Parallel Implementation of Vascular Network Modeling
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Performance Modeling and Analysis of a Massively Parallel Direct - Part 2
International Journal of High Performance Computing Applications
Proceedings of the 6th ACM conference on Computing frontiers
Full-system simulation of distributed memory multicomputers
Cluster Computing
Computer Networks: The International Journal of Computer and Telecommunications Networking
Asymmetric flow control for data transfer in hybrid computing systems
IBM Journal of Research and Development
Formal specification of MPI 2.0: Case study in specifying a practical concurrent programming API
Science of Computer Programming
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Shared receive queue based scalable MPI design for infiniband clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive connection management for scalable MPI over InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hadoop acceleration through network levitated merge
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
RXIO: Design and implementation of high performance RDMA-capable GridFTP
Computers and Electrical Engineering
ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
An SRP target mode to improve read performance of SRP-based IB-SANs
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over lnfiniBand which brings the benefit oF RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation achieves a latency of 6.8 µsec for small messages and a peak bandwidth of 871 million bytes/sec. Performance evaluation shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22%, compared with the original design. For large data transfers, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks.