Do We Need Anything More Than Single Bit Error Correction (ECC)?

  • Authors:
  • Affiliations:
  • Venue:
  • MTDT '04 Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design and Testing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

For a long time, single bit error correction (with double bit error detection) has been the mainstay ECC technology for covering soft errors in the cache. From the soft error rate that has been observed (at least terrestrially), people have been content with what single bit correction can offer. For the rare occasion that a double error occurs, ECC will also be able to alert the system and result in a graceful shutdown or otherwise. However, things are changing. As technology scaling continues, we are approaching the point where we will have a billion transistors on a single piece of silicon, with a big part of this budget as memory elements. In a system, the number of memory bits is also on the rise. The scaled technology also brings with it many variations and sensitivities that can cause memory cells to function improperly, or may not function well at certain environmental conditions. Increasingly, ECC is no longer serving as just radiation induced soft error correction, but may be able to affect other forms of fault corrections as well. Will ECC be able to serve this multi-faceted role? Do we need more than single bit error correction? Can we afford the cost of multiple bit error correction? Should we need it? This paper will attempt to answer some of these questions and raise issues with the status quo.