Fault-Tolerant Flow Control in On-chip Networks

  • Authors:
  • Young Hoon Kang;Taek-Jun Kwon;Jeffrey Draper

  • Affiliations:
  • -;-;-

  • Venue:
  • NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scaling of interconnects exacerbates the already challenging reliability of on-chip networks. Although many researchers have provided various fault handling techniques in chip multi-processors (CMPs), the fault-tolerance of the interconnection network is yet to adequately evolve. As an end-to-end recovery approach delays fault detection and complicates recovery to a consistent global state in such a system, a link-level retransmission is endorsed for recovery, making a higher-level protocol simple. In this paper, we introduce a fault-tolerant flow control scheme for soft error handling in on-chip networks. The fault-tolerant flow control recovers errors at a link-level by requesting retransmission and ensures an error-free transmission on a flit-basis with incorporation of dynamic packet fragmentation. Dynamic packet fragmentation is adopted as a part of fault-tolerant flow control to disengage flits from the fault-containment and recover the faulty flit transmission. Thus, the proposed router provides a high level of dependability at the link-level for both datapath and control planes. In simulation with injected faults, the proposed router is observed to perform well, gracefully degrading while exhibiting 97% error coverage in datapath elements. The proposed router has been implemented using a TSMC 45nm standard cell library. As compared to a router which employs triple modular redundancy (TMR) in datapath elements, the proposed router takes 58% less area and consumes 40% less energy per packet on average.