IEEE Parallel & Distributed Technology: Systems & Technology
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A New DMA Registration Strategy for Pinning-Based High Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
GASNet Specification, v1.1
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Shared memory programming for large scale machines
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters
Proceedings of the 21st annual international conference on Supercomputing
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations
International Journal of High Performance Computing and Networking
Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Shared receive queue based scalable MPI design for infiniband clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Designing a common communication subsystem
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Impact of over-decomposition on coordinated checkpoint/rollback protocol
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
High performance RDMA-based design of HDFS over InfiniBand
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
International Journal of Bioinformatics Research and Applications
Portable, MPI-interoperable coarray fortran
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Unified Parallel C (UPC) is an emerging parallel programming language that is based on a shared memory paradigm. MPI has been a widely ported and dominant parallel programming model for the past couple of decades. Real-life scientific applications require a lot of investment by domain scientists. Many scientists choose the MPI programming model as it is considered low-risk. It is unlikely that entire applications will be re-written using the emerging UPC language (or PGAS paradigm) in the near future. It is more likely that parts of these applications will be converted to newer models. This requires that underlying implementation of system software be able to support both UPC and MPI simultaneously. Unfortunately, the current state-of-the-art of UPC and MPI interoperability leaves much to be desired both in terms of performance and ease-of-use. In this paper, we propose "Integrated Native Communication Runtime" (INCR) for MPI and UPC communications on InfiniBand clusters. Our library is capable of supporting both UPC and MPI communications simultaneously. This runtime is based on the widely used MVAPICH (MPI over InfiniBand) Aptus runtime, which is known to scale to tens-of-thousands of cores. Our evaluation reveals that INCR is able to deliver equal or better performance compared to the existing UPC runtime - GASNet on InfiniBand verbs. We observe that with UPC NAS benchmarks CG and MG (class B) at 128 processes, we outperform current GASNet implementation by 10% and 23%, respectively.