Resilient die-stacked DRAM caches

  • Authors:
  • Jaewoong Sim;Gabriel H. Loh;Vilas Sridharan;Mike O'Connor

  • Affiliations:
  • Georgia Institute of Technology;AMD Research, Advanced Micro;RAS Architecture, Devices, Inc.;AMD Research, Advanced Micro

  • Venue:
  • Proceedings of the 40th Annual International Symposium on Computer Architecture
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Die-stacked DRAM can provide large amounts of in-package, high-bandwidth cache storage. For server and high-performance computing markets, however, such DRAM caches must also provide sufficient support for reliability and fault tolerance. While conventional off-chip memory provides ECC support by adding one or more extra chips, this may not be practical in a 3D stack. In this paper, we present a DRAM cache organization that uses error-correcting codes (ECCs), strong checksums (CRCs), and dirty data duplication to detect and correct a wide range of stacked DRAM failures, from traditional bit errors to large-scale row, column, bank, and channel failures. With only a modest performance degradation compared to a DRAM cache with no ECC support, our proposal can correct all single-bit failures, and 99.9993% of all row, column, and bank failures, providing more than a 54,000x improvement in the FIT rate of silent-data corruptions compared to basic SECDED ECC protection.