Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Mitigating Soft Errors in Highly Associative Cache with CAM-based Tag
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Mitigating soft error failures for multimedia applications by selective data protection
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
I-cache multi-banking and vertical interleaving
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Examining ACE analysis reliability estimates using fault-injection
Proceedings of the 34th annual international symposium on Computer architecture
Techniques for Efficient Software Checking
Languages and Compilers for Parallel Computing
Adopting the Drowsy Technique for Instruction Caches: A Soft Error Perspective
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
DRAM errors in the wild: a large-scale field study
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Cache vulnerability equations for protecting data in embedded processor caches from soft errors
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Minimizing soft errors in TCAM devices: a probabilistic approach to determining scrubbing intervals
IEEE Transactions on Circuits and Systems Part I: Regular Papers
Partitioning techniques for partially protected caches in resource-constrained embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
DRAM errors in the wild: a large-scale field study
Communications of the ACM
Optimizing power and performance for reliable on-chip networks
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Radiation-induced Soft Errors: A Chip-level Modeling Perspective
Foundations and Trends in Electronic Design Automation
Soft errors issues in low-power caches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A framework for correction of multi-bit soft errors in L2 caches based on redundancy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Soft error benchmarking of L2 caches with PARMA
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Soft error benchmarking of L2 caches with PARMA
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Smart cache cleaning: energy efficient vulnerability reduction in embedded processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Setting an error detection infrastructure with low cost acoustic wave detectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
A study of DRAM failures in the field
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An error tolerant CAM with nand match-line organization
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Exploring DRAM organizations for energy-efficient and resilient exascale memories
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling energy efficient reliability in embedded systems through smart cache cleaning
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Journal of Network and Systems Management
Hi-index | 0.02 |
Transient faults from neutron and alpha particle strikes in large SRAM caches have become a major problem for microprocessor designers. To protect these caches, designers often use error correcting codes (ECC), which typically provide single-bit error correction and double-bit error detection (SECDED). Unfortunately, two separate strikes could still flip two different bits in the same ECC-protected word. This we call a temporal double-bit error. SECDED ECC can only detect 驴 not correct 驴 such errors.This paper shows how to compute the mean time to failure for temporal double-bit errors. Additionally, we show how fixed-interval scrubbing 驴 in which error checkers periodically access cache blocks and remove single-bit errors 驴 can mitigate such errors in processor caches. Our analysis using current soft error rates shows that only very large caches (e.g., hundreds of megabytes to gigabytes) need scrubbing to reduce the temporal double-bit error rate to a tolerable range.