An adaptive low-overhead mechanism for dependable general-purpose many-core processors

  • Authors:
  • Wentao Jia;Rui Li;Chunyan Zhang

  • Affiliations:
  • School of Computer, National University of Defense Technology, China;School of Computer, National University of Defense Technology, China;School of Computer, National University of Defense Technology, China

  • Venue:
  • ICT-EurAsia'13 Proceedings of the 2013 international conference on Information and Communication Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Future many-core processors may contain more than 1000 cores on single die. However, continued scaling of silicon fabrication technology exposes chip orders of such magnitude to a higher vulnerability to errors. A low-overhead and adaptive fault-tolerance mechanism is desired for general-purpose many-core processors. We propose high-level adaptive redundancy (HLAR), which possesses several unique properties. First, the technique employs selective redundancy based application assistance and dynamically cores schedule. Second, the method requires minimal overhead when the mechanism is disabled. Third, it expands the local memory within the replication sphere, which heightens the replication level and simplifies the redundancy mechanism. Finally, it decreases bandwidth through various compression methods, thus effectively balancing reliability, performance, and power. Experimental results show a remarkably low overhead while covering 99.999% errors with only 0.25% more networks-on-chip traffic.