Modeling soft errors for data caches and alleviating their effects on data reliability

Authors:
Ismail Kadayif;Hande Sen;Selcuk Koyuncu
Affiliations:
Canakkale Onsekiz Mart University, Canakkale 17100, Turkey;Canakkale Onsekiz Mart University, Canakkale 17100, Turkey;Canakkale Onsekiz Mart University, Canakkale 17100, Turkey
Venue:
Microprocessors & Microsystems
Year:
2010

Citing 27
Cited 1

Reliable computer systems (2nd ed.): design and evaluation

Reliable computer systems (2nd ed.): design and evaluation
Terrestrial cosmic rays

IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Transient fault detection via simultaneous multithreading

Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
High Availability and Reliability in the Itanium Processor

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Soft Error Sensitivity Characterization for Microprocessor Dependability Enhancement Strategy

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

Proceedings of the 31st annual international symposium on Computer architecture
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Soft error and energy consumption interactions: a data cache perspective

Proceedings of the 2004 international symposium on Low power electronics and design
Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes

IEEE Transactions on Dependable and Secure Computing
The Soft Error Problem: An Architectural Perspective

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Opportunistic Transient-Fault Detection

Proceedings of the 32nd annual international symposium on Computer Architecture
Computing Architectural Vulnerability Factors for Address-Based Structures

Proceedings of the 32nd annual international symposium on Computer Architecture
SoftArch: An Architecture Level Tool for Modeling and Analyzing Soft Errors

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Reducing Data Cache Susceptibility to Soft Errors

IEEE Transactions on Dependable and Secure Computing
Exploiting Narrow Values for Soft Error Tolerance

IEEE Computer Architecture Letters
Modeling and improving data cache reliability: 1

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Soft errors issues in low-power caches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Review: A survey of memory error correcting techniques for improved reliability

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Soft errors caused by strikes arising from energetic particles pose a significant reliability concern for computing systems. In this study, we first introduce a model for soft error occurrence and propagation in cache memories. Based on this model, we define a metric called Architectural Vulnerability Factor for Caches (AVFC), which represents the probability with which a fault in the cache can be visible in the final output of the program. We then propose three architectural schemes for improving reliability. Our first scheme prevents an error from propagating to the lower levels in the memory hierarchy by not forwarding the unmodified data words of dirty cache blocks to the L2 cache at write-backs. The second scheme selectively invalidates cache blocks to reduce their vulnerable periods. To reduce the performance overhead caused by block invalidation, our third scheme tries to bring a fresh copy of the invalidated block into the cache via prefetching. The experimental results for the SPEC2000 suite show that, based on the proposed model, our first and third schemes together can improve the data reliability roughly 96% at the cost of less than 1% overhead in execution time, quite more than data improvements achieved by either two well-known techniques, namely write-through and early write-back cache mechanisms.