Employing transport layer multi-railing in cluster networks

  • Authors:
  • Brad Penoff;Humaira Kamal;Alan Wagner;Mike Tsai;Karol Mroz;Janardhan Iyengar

  • Affiliations:
  • Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Cisco Systems Inc., San Jose, CA, United States;Cisco Systems Inc., San Jose, CA, United States;Department of Math and Computer Science, Franklin and Marshall College, United States

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Building clusters from commodity off-the-shelf parts is a well-established technique for building inexpensive medium- to large-size computing clusters. Many commodity mid-range motherboards come with multiple Gigabit Ethernet interfaces, and the low cost per port for Gigabit Ethernet makes switches inexpensive as well. Our objective in this work is to take advantage of multiple inexpensive Gigabit network cards and Ethernet switches to enhance the communication and reliability performance of a cluster. Unlike previous approaches that take advantage of multiple network connections for multi-railing, we consider CMT (Concurrent Multipath Transfer) that extends SCTP (Stream Control Transmission Protocol), a transport protocol developed by the IETF, to make use of the multiple paths that exist between two hosts. In this work, we explore the applicability of CMT in the transport layer of the network stack to high-performance computing environments. We develop SCTP-based MPI (Message Passing Interface) middleware for MPICH2 and Open MPI, and evaluate the reliability and communication performance of the system. Using Open MPI with support for message striping over multiple paths at the middleware level, we compare the differences in supporting multi-railing in the middleware versus at the transport layer.