Functional correctness for CMP interconnects

  • Authors:
  • Rawan Abdel-Khalek;Ritesh Parikh;Andrew DeOrio;Valeria Bertacco

  • Affiliations:
  • Department of Computer Science and Engineering, University of Michigan, USA;Department of Computer Science and Engineering, University of Michigan, USA;Department of Computer Science and Engineering, University of Michigan, USA;Department of Computer Science and Engineering, University of Michigan, USA

  • Venue:
  • ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

As transistor counts continue to scale, modern designs are transitioning towards large chip multi-processors (CMPs). In order to match the advancing performance of CMPs, on-chip interconnects are becoming increasingly complex, commonly deploying advanced network-on-chip (NoC) structures. Ensuring the correct operation of these system-level infrastructures has become increasingly problematic and, in order to avoid the potential for functional design errors manifesting into the final product, there is a need for mechanisms to safeguard communication integrity at runtime. In this paper, we propose SafeNoC, an end-to-end error detection and recovery solution to ensure the functional correctness of CMP interconnects. SafeNoC augments the existing interconnect with a simple, lightweight checker network that is guaranteed to deliver messages correctly. For each data message sent over the primary NoC, a look-ahead signature is transmitted over the checker network and is used to detect errors in the corresponding data message. If a functional communication bug is detected, a novel recovery algorithm reconstructs the data that was in flight at the time of the error occurrence, ensuring that it reaches the intended destination. In our experiments, we found that SafeNoC can recover from a wide variety of errors, with almost no performance impact in the absence of errors. A lightweight solution, SafeNoC occupies a 2.41% area overhead in a 64-core CMP, 7脳 smaller than common retransmission-based approaches.