Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and Tori

  • Authors:
  • A. Mejia;J. Flich;J. Duato;Sven-Arne Reinemo;Tor Skeie

  • Affiliations:
  • Dpto de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia, Valencia, Spain;Dpto de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia, Valencia, Spain;Dpto de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia, Valencia, Spain;Simula Research Laboratory, Lysaker, Norway;Simula Research Laboratory, Lysaker, Norway

  • Venue:
  • IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computers get faster every year, but the demand for computing resources seems to grow at an even faster rate. Depending on the problem domain, this demand for more power can be satisfied by either, massively parallel computers, or clusters of computers. Common for both approaches is the dependence on high performance interconnect networks such as Myrinet, Infiniband, or 10 Gigabit Ethernet. While high throughput and low latency are key features of interconnection networks, the issue of faul-ttolerance is now becoming increasingly important. As the number of network components grows so does the probability for failure, thus it becomes important to also consider the fault-tolerance mechanism of interconnection networks. The main challenge then lies in combining performance and fault-tolerance, while still keeping cost and complexity low. This paper proposes a new deterministic routing methodology for tori and meshes, which achieves high performance without the use of virtual channels. Furthermore, it is topology agnostic in nature, meaning it can handle any topology derived from any combination of faults when combined with static reconfiguration. The algorithm, referred to as Segment-based Routing (SR), works by partitioning a topology into subnets, and subnets into segments. This allows us to place bidirectional turn restrictions locally within a segment. As segments are independent, we gain the freedom to place turn restrictions within a segment independently from other segments. This results in a larger degree of freedom when placing turn restrictions compared to other routing strategies. In this paper a way to compute segment-based routing tables is presented and applied to meshes and tori. Evaluation results show that SR increases performance by a factor of 1.8 over FX and up*/down* routing.