Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Prototype Implementation of MPI for SMARTMAP
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Efficient Shared Memory Message Passing for Inter-VM Communications
Euro-Par 2008 Workshops - Parallel Processing
Exploiting Direct Access Shared Memory for MPI On Multi-Core Processors
International Journal of High Performance Computing Applications
Locality and topology aware intra-node communication among multicore CPUs
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Kernel-Assisted MPI Collective Communication among Many-core Clusters
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A synchronous mode MPI implementation on the cell BETM architecture
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework
Journal of Parallel and Distributed Computing
Hybrid MPI: efficient message passing for multi-core systems
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
High performance intra-node communication support for MPI applications is critical for achieving best performance from clusters of SMP workstations. Present day MPI stacks cannot make use of operating system kernel support for intra-node communication. This is primarily due to the lack of an efficient, portable, stable and MPI friendly interface to access the kernel functions. In this paper we attempt to address design challenges for implementing such a high performance and portable kernel module interface. We implement a kernel module interface called LiMIC and integrate it with MVAPICH, an open source MPI overInfini-Band. Our performance evaluation reveals that thepoint-to-point latency can be reduced by 71% and the bandwidth improved by 405% for 64KB message size. In addition, LiMIC can improve HPCC Effective Bandwidth and NAS IS class B benchmarks by 12% and 8%, respectively, on an 8-node dual SMP InfiniBand cluster.