A long-distance InfiniBand interconnection between two clusters in production use

  • Authors:
  • Sabine Richling;Heinz Kredel;Steffen Hau;Hans-Günther Kruse

  • Affiliations:
  • University of Heidelberg, Heidelberg, Germany;University of Mannheim, Mannheim, Germany;University of Mannheim, Mannheim, Germany;University of Mannheim, Mannheim, Germany

  • Venue:
  • State of the Practice Reports
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We discuss operational and organizational issues of an InfiniBand interconnection between two clusters over a distance of 28 km in day-to-day production use. We describe the setup of hardware and networking components, and the solution of technical integration problems. Then we present solutions for a federated authorization system for the cluster within our two participating universities and other organizational integration problems. Performance measurements for MPI communication and file access to Lustre storage systems are presented. The results and a simple performance model show that MPI performance is intrinsically poor across the long-distance interconnection with limited bandwidth. However, file access and MPI communication among nodes on each side are barely affected by the limitations of the interconnection even at high load. Our organizational and technical setup allows the operation of the two clusters as a single system with lower administration costs and a better load balance than in a disconnected setup.