Reliable computer systems (2nd ed.): design and evaluation
Reliable computer systems (2nd ed.): design and evaluation
IBM Journal of Research and Development - Special issue: terrestrial cosmic rays and soft errors
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Transient-fault recovery using simultaneous multithreading
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Soft Error Sensitivity Characterization for Microprocessor Dependability Enhancement Strategy
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor
Proceedings of the 31st annual international symposium on Computer architecture
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Soft error and energy consumption interactions: a data cache perspective
Proceedings of the 2004 international symposium on Low power electronics and design
Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes
IEEE Transactions on Dependable and Secure Computing
The Soft Error Problem: An Architectural Perspective
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Opportunistic Transient-Fault Detection
Proceedings of the 32nd annual international symposium on Computer Architecture
Computing Architectural Vulnerability Factors for Address-Based Structures
Proceedings of the 32nd annual international symposium on Computer Architecture
SoftArch: An Architecture Level Tool for Modeling and Analyzing Soft Errors
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Reducing Data Cache Susceptibility to Soft Errors
IEEE Transactions on Dependable and Secure Computing
Exploiting Narrow Values for Soft Error Tolerance
IEEE Computer Architecture Letters
Modeling and improving data cache reliability: 1
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Soft errors issues in low-power caches
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Review: A survey of memory error correcting techniques for improved reliability
Journal of Network and Computer Applications
Hi-index | 0.00 |
Soft errors caused by strikes arising from energetic particles pose a significant reliability concern for computing systems. In this study, we first introduce a model for soft error occurrence and propagation in cache memories. Based on this model, we define a metric called Architectural Vulnerability Factor for Caches (AVFC), which represents the probability with which a fault in the cache can be visible in the final output of the program. We then propose three architectural schemes for improving reliability. Our first scheme prevents an error from propagating to the lower levels in the memory hierarchy by not forwarding the unmodified data words of dirty cache blocks to the L2 cache at write-backs. The second scheme selectively invalidates cache blocks to reduce their vulnerable periods. To reduce the performance overhead caused by block invalidation, our third scheme tries to bring a fresh copy of the invalidated block into the cache via prefetching. The experimental results for the SPEC2000 suite show that, based on the proposed model, our first and third schemes together can improve the data reliability roughly 96% at the cost of less than 1% overhead in execution time, quite more than data improvements achieved by either two well-known techniques, namely write-through and early write-back cache mechanisms.