Cache Scrubbing in Microprocessors: Myth or Necessity?

Authors:
Shubhendu S. Mukherjee;Joel Emer;Tryggve Fossum;Steven K. Reinhardt
Affiliations:
-;-;-;-
Venue:
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Year:
2004

Citing 0
Cited 27

Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

Proceedings of the 31st annual international symposium on Computer architecture
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
Mitigating Soft Errors in Highly Associative Cache with CAM-based Tag

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Mitigating soft error failures for multimedia applications by selective data protection

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
I-cache multi-banking and vertical interleaving

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Examining ACE analysis reliability estimates using fault-injection

Proceedings of the 34th annual international symposium on Computer architecture
Techniques for Efficient Software Checking

Languages and Compilers for Parallel Computing
Adopting the Drowsy Technique for Instruction Caches: A Soft Error Perspective

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
DRAM errors in the wild: a large-scale field study

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Architecture Design for Soft Errors

Architecture Design for Soft Errors
Cache vulnerability equations for protecting data in embedded processor caches from soft errors

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Minimizing soft errors in TCAM devices: a probabilistic approach to determining scrubbing intervals

IEEE Transactions on Circuits and Systems Part I: Regular Papers
Partitioning techniques for partially protected caches in resource-constrained embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
DRAM errors in the wild: a large-scale field study

Communications of the ACM
Optimizing power and performance for reliable on-chip networks

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Radiation-induced Soft Errors: A Chip-level Modeling Perspective

Foundations and Trends in Electronic Design Automation
Soft errors issues in low-power caches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A framework for correction of multi-bit soft errors in L2 caches based on redundancy

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Soft error benchmarking of L2 caches with PARMA

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Soft error benchmarking of L2 caches with PARMA

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Smart cache cleaning: energy efficient vulnerability reduction in embedded processors

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Setting an error detection infrastructure with low cost acoustic wave detectors

Proceedings of the 39th Annual International Symposium on Computer Architecture
A study of DRAM failures in the field

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An error tolerant CAM with nand match-line organization

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Exploring DRAM organizations for energy-efficient and resilient exascale memories

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling energy efficient reliability in embedded systems through smart cache cleaning

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
TCAMChecker: A Software Approach to the Error Detection and Correction of TCAM-Based Networking Systems

Journal of Network and Systems Management

Quantified Score

Hi-index	0.02

Visualization

Abstract

Transient faults from neutron and alpha particle strikes in large SRAM caches have become a major problem for microprocessor designers. To protect these caches, designers often use error correcting codes (ECC), which typically provide single-bit error correction and double-bit error detection (SECDED). Unfortunately, two separate strikes could still flip two different bits in the same ECC-protected word. This we call a temporal double-bit error. SECDED ECC can only detect 驴 not correct 驴 such errors.This paper shows how to compute the mean time to failure for temporal double-bit errors. Additionally, we show how fixed-interval scrubbing 驴 in which error checkers periodically access cache blocks and remove single-bit errors 驴 can mitigate such errors in processor caches. Our analysis using current soft error rates shows that only very large caches (e.g., hundreds of megabytes to gigabytes) need scrubbing to reduce the temporal double-bit error rate to a tolerable range.