A novel NoC-based design for fault-tolerance of last-level caches in CMPs

  • Authors:
  • Abbas BanaiyanMofrad;Gustavo Girão;Nikil Dutt

  • Affiliations:
  • University of California, Irvine, Irvine, CA, USA;Federal University of Rio Grande do Sul, Porto Alegre, Brazil;University of California, Irvine, Irvine, CA, USA

  • Venue:
  • Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Advances in technology scaling, coupled with aggressive voltage scaling results in significant reliability challenges for emerging Chip Multiprocessor (CMP) platforms, where error-prone caches continue to dominate the chip area. Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these CMPs. We present a novel fault-tolerant scheme for Last Level Cache (LLC) in CMP architectures that leverages the interconnection network to protect the LLC cache banks against permanent faults. During a LLC access to a faulty area, the network detects and corrects the faults, returning the fault-free data to the requesting core. By leveraging the NoC interconnection fabric, we can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner. We perform extensive design space exploration on NoC benchmarks to demonstrate the utility and efficacy of our approach. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.