Reducing cache power with low-cost, multi-bit error-correcting codes

  • Authors:
  • Chris Wilkerson;Alaa R. Alameldeen;Zeshan Chishti;Wei Wu;Dinesh Somasekhar;Shih-lien Lu

  • Affiliations:
  • Intel Laboratories, Hillsboro, USA;Intel Laboratories, Hillsboro, USA;Intel Laboratories, Hillsbor, USA;Intel Laboratories, Hillsboro, USA;Intel Laboratories, Hillsboro, USA;Intel Laboratories, Hillsboro, USA

  • Venue:
  • Proceedings of the 37th annual international symposium on Computer architecture
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Technology advancements have enabled the integration of large on-die embedded DRAM (eDRAM) caches. eDRAM is significantly denser than traditional SRAMs, but must be periodically refreshed to retain data. Like SRAM, eDRAM is susceptible to device variations, which play a role in determining refresh time for eDRAM cells. Refresh power potentially represents a large fraction of overall system power, particularly during low-power states when the CPU is idle. Future designs need to reduce cache power without incurring the high cost of flushing cache data when entering low-power states. In this paper, we show the significant impact of variations on refresh time and cache power consumption for large eDRAM caches. We propose Hi-ECC, a technique that incorporates multi-bit error-correcting codes to significantly reduce refresh rate. Multi-bit error-correcting codes usually have a complex decoder design and high storage cost. Hi-ECC avoids the decoder complexity by using strong ECC codes to identify and disable sections of the cache with multi-bit failures, while providing efficient single-bit error correction for the common case. Hi-ECC includes additional optimizations that allow us to amortize the storage cost of the code over large data words, providing the benefit of multi-bit correction at same storage cost as a single-bit error-correcting (SECDED) code (2% overhead). Our proposal achieves a 93% reduction in refresh power vs. a baseline eDRAM cache without error correcting capability, and a 66% reduction in refresh power vs. a system using SECDED codes.