Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective

Authors:
A. Vishnu;M. Koop;A. Moody;A. R. Mamidala;S. Narravula;D. K. Panda
Affiliations:
Ohio State University;Ohio State University;Lawrence Livermore National Lab;Ohio State University;Ohio State University;Ohio State University
Venue:
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Year:
2007

Citing 0
Cited 8

Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Improving communication-phase completion times in HPC clusters through congestion mitigation

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Dynamic and Distributed Multipath Routing Policy for High-Speed Cluster Networks

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Sockets direct protocol for hybrid network stacks: a case study with iWARP over 10G Ethernet

HiPC'08 Proceedings of the 15th international conference on High performance computing
Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
DeTail: reducing the flow completion time tail in datacenter networks

Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication
DeTail: reducing the flow completion time tail in datacenter networks

ACM SIGCOMM Computer Communication Review - Special october issue SIGCOMM '12
Designing energy efficient communication runtime systems: a view from PGAS models

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large scale InfiniBand clusters are becoming increasingly popular, as reflected by the TOP 500 Supercomputer rankings. At the same time, fat tree has become a popular interconnection topology for these clusters, since it allows multiple paths to be available in between a pair of nodes. However, even with fat tree, hot-spots may occur in the network depending upon the route configuration between end nodes and communication pattern(s) in the application. To make matters worse, the deterministic routing nature of InfiniBand limits the application from effective use of multiple paths transparently and avoid the hot-spots in the network. Simulation based studies for switches and adapters to implement congestion control have been proposed in the literature. However, these studies have focussed on providing congestion control for the communication path, and not on utilizing multiple paths in the network for hot-spot avoidance. In this paper, we design an MPI functionality, which provides hot-spot avoidance for different communications, without a priori knowledge of the pattern. We leverage LMC (LID Mask Count) mechanism of InfiniBand to create multiple paths in the network and present the design issues (scheduling policies, selecting number of paths, scalability aspects) of our design. We implement our design and evaluate it with Pallas collective communication and MPI applications. On an InfiniBand cluster with 48 processes, MPI All-to-all Personalized shows an improvement of 27%. Our evaluation with NAS Parallel Benchmarks on 64 processes shows significant improvement in execution time with this functionality.