Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism

  • Authors:
  • V. Puente;J. A. Gregorio;F. Vallejo;R. Beivide

  • Affiliations:
  • University of Cantabria, Spain;University of Cantabria, Spain;University of Cantabria, Spain;University of Cantabria, Spain

  • Venue:
  • Proceedings of the 31st annual international symposium on Computer architecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new and efficient mechanism to tolerate failures ininterconnection networks for parallel and distributedcomputers, denoted as Immunet, is presented in this work.In the presence of failures, Immunet automatically reactswith a hardware reconfiguration of the surviving networkresources. Immunet has four important advantages overprevious fault-tolerant switching mechanisms. Its lowhardware costs minimize the overhead that the networkmust support in absence of faults. As long as the networkremains connected, Immunet can tolerate any number offailures regardless of their spatial and temporalcombinations. The resulting communication infrastructureprovides optimized adaptive minimal routing over thesurviving topology. The system behavior under successivefailures exhibits graceful performance degradation.Immunet reconfiguration can be totally transparent tothe applications running on the parallel system as they willonly be affected by the loss of those data packetscirculating through the broken components. The rest of thepackets will suffer only a tolerable delay induced by thetime employed to perform the automatic networkreconfiguration. Descriptions of the hardware networkarchitecture and detailed synthetic and execution-drivensimulations will demonstrate the benefits of Immunet.