Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections

  • Authors:
  • Jie Chen;William Watson III;Robert Edwards;Weizhen Mao

  • Affiliations:
  • HPC Group, Jefferson Lab, Newport News, VA;HPC Group, Jefferson Lab, Newport News, VA;Theory Group, Jefferson Lab, Newport News, VA;College of William and Mary, Williamsburg, VA

  • Venue:
  • IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multiple copper-based commodity Gigabit Ethernet (GigE) interconnects (adapters) on a single host can lead to Linux clusters with mesh/torus connections without using expensive switches and high speed network interconnects (NICs). However traditional message passing systems based on TCP for GigE will not perform well for this type of clusters because of the overhead of TCP for multiple GigE links. In this paper, we present two os-bypass message passing systems that are based on a modified M-VIA (an implementation of VIA specification) for two production GigE mesh clusters: one is constructed as a 4x8x8 (256 nodes) torus and has been in production use for a year; the other is constructed as a 6x8x8 (384 nodes) torus and was deployed recently. One of the message passing systems targets to a specific application domain and is called QMP and the other is an implementation of MPI specification 1.1. The GigE mesh clusters using these two message passing systems achieve about 18.5 驴s half-way round trip latency and 400MB/s total bandwidth, which compare reasonably well to systems using specialized high speed adapters in a switched architecture at much lower costs.