On the design and analysis of fault tolerant NoC architecture using spare routers

  • Authors:
  • Yung-Chang Chang;Ching-Te Chiu;Shih-Yin Lin;Chung-Kai Liu

  • Affiliations:
  • Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C.;National Tsing Hua University, Hsinchu, Taiwan, R.O.C.;Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C.;Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C.

  • Venue:
  • Proceedings of the 16th Asia and South Pacific Design Automation Conference
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aggressive advent in VLSI manufacturing technology has made dramatic impacts on the dependability of devices and interconnects. In the modern manycore system, mesh based Networks-on-Chip (NoC) is widely adopted as on chip communication infrastructure. It is critical to provide an effective fault tolerance scheme on mesh based NoC. A faulty router or broken link isolates a well functional processing element (PE). Also, a set of faulty routers form faulty regions which may break down the whole design. To address these issues, we propose an innovative router-level fault tolerance scheme with spare routers which is different from the traditional microarchitecture-level approach. The spare routers not only provide redundancies but also diversify connection paths between adjacent routers. To exploit these valuable resources on fault tolerant capabilities, two configuration algorithms are demonstrated. One is shift-and-replace-allocation (SARA) and the other is defect-awareness-path-allocation (DAPA) that takes advantage of path diversity in our architecture. The proposed design is transparent to any routing algorithm since the output topology is consistent to the original mesh. Experimental results show that our scheme has remarkable improvements on fault tolerant metrics including reliability, mean time to failure (MTTF), and yield. In addition, the performance of spare router increases with the growth of NoC size but the relative connection cost decreases at the same time. This rare and valuable characteristic makes our solution suitable for large scale NoC design.