Reducing Data Cache Susceptibility to Soft Errors

  • Authors:
  • Vilas Sridharan;Hossein Asadi;Mehdi B. Tahoori;David Kaeli

  • Affiliations:
  • IEEE;IEEE;IEEE;IEEE

  • Venue:
  • IEEE Transactions on Dependable and Secure Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data caches are a fundamental component of most modern microprocessors. They provide for efficient read/write access to data memory. Errors occurring in the data cache can corrupt data values or state, and can easily propagate throughout the memory hierarchy. One of the main threats to data cache reliability is soft (transient, nonreproducible) errors. These errors can occur more often than hard (permanent) errors, and most often arise from Single Event Upsets (SEUs) caused by strikes from energetic particles such as neutrons and alpha particles. Many protection techniques exist for data caches; the most common are ECC (Error Correcting Codes) and parity. These protection techniques detect all single bit errors and, in the case of ECC, correct them. To make proper design decisions about which protection technique to use, accurate design-time modeling of cache reliability is crucial. In addition, as caches increase in storage capacity, another important goal is to reduce the failure rate of a cache, to limit disruption to normal system operation. In this paper, we present our modeling approach for assessing the impact of soft errors using architectural simulators. We also describe a new technique for reducing the vulnerability of data caches: refetching. By selectively refetching cache lines from the ECC-protected L2 cache, we can significantly reduce the vulnerability of the L1 data cache. We discuss and present results for two different algorithms that perform selective refetch. Experimental results show that we can obtain an 85 percent decrease in vulnerability when running the SPEC2K benchmark suite while only experiencing a slight decrease in performance. Our results demonstrate that selective refetch can cost-effectively decrease the error rate of an L1 data cache.