MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters

Authors:
Sreeram Potluri;Devendar Bureddy;Khaled Hamidouche;Akshay Venkatesh;Krishna Kandalla;Hari Subramoni;Dhabaleswar K. (Dk) Panda
Affiliations:
The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University;The Ohio State University
Venue:
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2013

Citing 7
Cited 0

Impact of On-Demand Connection Management in MPI over VIA

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Designing and dynamically load balancing hybrid LU for multi/many-core

Computer Science - Research and Development
Early experiences with the intel many integrated cores accelerated computing technology

Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery
Enabling CUDA acceleration within virtual machines using rCUDA

HIPC '11 Proceedings of the 2011 18th International Conference on High Performance Computing
Experiments with WRF on intel® many integrated core (intel MIC) architecture

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Abstract: An MPI Library implementing Direct Communication for Many-Core Based Accelerators

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Xeon Phi, based on the Intel Many Integrated Core (MIC) architecture, packs up to 1TFLOPs of performance on a single chip while providing x86__64 compatibility. On the other hand, InfiniBand is one of the most popular choices of interconnect for supercomputing systems. The software stack on Xeon Phi allows processes to directly access an InfiniBand HCA on the node and thus, provides a low latency path for internode communication. However, drawbacks in the state-of-the-art chipsets like Sandy Bridge limit the bandwidth available for these transfers. In this paper, we propose MVAPICH-PRISM, a novel proxy-based framework to optimize the communication performance on such systems. We present several designs and evaluate them using micro-benchmarks and application kernels. Our designs improve internode latency between Xeon Phi processes by up to 65% and internode bandwidth by up to five times. Our designs improve the performance of MPI_Alltoall operation by up to 65%, with 256 processes. They improve the performance of a 3D Stencil communication kernel and the P3DFFT library by 56% and 22% with 1,024 and 512 processes, respectively.