RI2N/DRV: Multi-link ethernet for high-bandwidth and fault-tolerant network on PC clusters

  • Authors:
  • Shin'ichi Miura; Toshihiro Hanawa; Taiga Yonemoto;Taisuke Boku;Mitsuhisa Sato

  • Affiliations:
  • Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Ibaraki 305-8577, Japan;Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Ibaraki 305-8577, Japan;Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Ibaraki 305-8577, Japan;Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Ibaraki 305-8577, Japan;Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Ibaraki 305-8577, Japan

  • Venue:
  • IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although recent high-end interconnection network devices and switches provide a high performance to cost ratio, most of the small to medium sized PC clusters are still built on the commodity network, Ethernet. To enhance performance on commonly used Gigabit Ethernet networks, link aggregation or binding technology is used. Currently, Linux kernels are equipped with software named Linux Channel Bonding (LCB), which is based IEEE802.3ad Link Aggregation technology. However, standard LCB has the disadvantage of mismatch with the TCP protocol; consequently, both large latency and bandwidth instability can occur. Fault-tolerance feature is supported by LCB, but the usability is not sufficient. We developed a new implementation similar to LCB named Redundant Interconnection with Inexpensive Network with Driver (RI2N/DRV) for use on Gigabit Ethernet. RI2N/DRV has a complete software stack that is very suitable for TCP, an upper layer protocol. Our algorithm suppresses unnecessary ACK packets and retransmission of packets, even in imbalanced network traffic and link failures on multiple links. It provides both high-bandwidth and fault-tolerant communication on multi-link Gigabit Ethernet. We confirmed that this system improves the performance and reliability of the network, and our system can be applied to ordinary UNIX services such as network file system (NFS), without any modification of other modules.