Fault tolerant network on chip switching with graceful performance degradation

  • Authors:
  • Adán Kohler;Gert Schley;Martin Radetzki

  • Affiliations:
  • University of Stuttgart, Stuttgart, Germany;University of Stuttgart, Stuttgart, Germany;University of Stuttgart, Stuttgart, Germany

  • Venue:
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special issue on the 2009 ACM/IEEE international symposium on networks-on-chip
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The structural redundancy inherent to on-chip interconnection networks [networks on chip (NoC)] can be exploited by adaptive routing algorithms in order to provide connectivity even if network components are out of service due to faults, which will appear at an increasing rate with future chip technology nodes. This paper is based on a new, fine-grained functional fault model and a corresponding distributed fault diagnosis method that facilitate determining the fault status of individual NoC switches and their adjacent communication links. Whereas previous work on network fault-tolerance assume switches to be either available or fully out of service, we present a novel adaptive routing algorithm that employs the remaining functionality of partly defective switches. Using diagnostic information, transient faults are handled with a retransmission scheme that avoids the latency penalty of end-to-end repeat requests. Thereby, graceful degradation of NoC communication performance can be achieved even under high failure rates.