Contention-free many-to-many communication scheduling for high performance clusters

  • Authors:
  • Satyajit Banerjee;Atish Datta Chowdhury;Koushik Sinha;Subhas Kumar Ghosh

  • Affiliations:
  • Honeywell Technology Solutions, Bangalore, India;Honeywell Technology Solutions, Bangalore, India;Honeywell Technology Solutions, Bangalore, India;Siemens Corporate Research and Technologies, Bangalore, India

  • Venue:
  • ICDCIT'11 Proceedings of the 7th international conference on Distributed computing and internet technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the context of generating efficient, contention free schedules for inter-node communication through a switch fabric in cluster computing or data center type environments, all-to-all scheduling with equal sized data transfer requests has been studied in the literature [1, 3, 4]. In this paper, we propose a communication scheduling module (CSM) towards generating contention free communication schedules for many-to-many communication with arbitrary sized data. Towards this end, we propose three approximation algorithms - PST, LDT and SDT. From time to time, the CSM first generates a bipartite graph from the set of received requests, then determines which of these three algorithms gives the best approximation factor on this graph and finally executes that algorithm to generate a contention free schedule. Algorithm PST has a worst case run time of O(max (Δ|E|, |E| log (|E|))) and guarantees an approximation factor of 2H2Δ-1, where |E| is the number of edges in the bipartite graph, Δ is the maximum node degree of the bipartite graph and H2Δ-1 is the (2Δ - 1)- th harmonic number. LDT runs in O(|E|2) and has an approximation factor of 2(1 + τ), where τ is a constant defined as a guard band or pause time to eliminate the possibility of contention (in an apparently contention free schedule) caused by system jitter and synchronization inaccuracies between the nodes. SDT gives an approximation factor of 4 log (wmax) and has a worst case run time of O(Δ|E| log (wmax)), where wmax represents the longest communication time in a set of received requests.