Very high resolution simulation of compressible turbulence on the IBM-SP system
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Impact of On-Demand Connection Management in MPI over VIA
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A network-failure-tolerant message-passing system for terascale clusters
International Journal of Parallel Programming
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Infiniband scalability in open MPI
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Shared receive queue based scalable MPI design for infiniband clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive connection management for scalable MPI over InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 22nd annual international conference on Supercomputing
X-SRQ - Improving Scalability and Performance of Multi-core InfiniBand Clusters
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Efficient On-Demand Connection Management Mechanisms with PGAS Models over InfiniBand
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Unifying UPC and MPI runtimes: experience with MVAPICH
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Introducing scalable quantum approaches in language representation
QI'11 Proceedings of the 5th international conference on Quantum interaction
WMTools - assessing parallel application memory utilisation at scale
EPEW'11 Proceedings of the 8th European conference on Computer Performance Engineering
Accelerating text mining workloads in a MapReduce-based distributed GPU environment
Journal of Parallel and Distributed Computing
A study of application-level recovery methods for transient network faults
ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems
Hi-index | 0.00 |
High-performance clusters have been growing rapidly in scale. Most of these clusters deploy a high-speed interconnect, such as Infini-Band, to achieve higher performance. Most scientific applications executing on these clusters use the Message Passing Interface (MPI) as the parallel programming model. Thus, the MPI library has a key role in achieving application performance by consuming as few resources as possible and enabling scalable performance. State-of-the-art MPI implementations over InfiniBand primarily use the Reliable Connection (RC) transport due to its good performance and attractive features. However, the RC transport requires a connection between every pair of communicating processes, with each requiring several KB of memory. As clusters continue to scale, memory requirements in RC-based implementations increase. The connection-less Unreliable Datagram (UD) transport is an attractive alternative, which eliminates the need to dedicate memory for each pair of processes. In this paper we present a high-performance UD-based MPI design. We implement our design and compare the performance and resource usage with the RC-based MVAPICH. We evaluate NPB, SMG2000, Sweep3D, and sPPM up to 4K processes on an 9216-core InfiniBand cluster. For SMG2000, our prototype shows a 60% speedup and seven-fold reduction in memory for 4K processes. Additionally, based on our model, our design has an estimated 30 times reduction in memory over MVAPICH at 16K processes when all connections are created. To the best of our knowledge, this is the first research work that presents a high-performance MPI design over InfiniBand that is completely based on UD and can achieve near identical or better application performance than RC.