The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Area efficient architectures for information integrity in cache memories
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Design Challenges of Technology Scaling
IEEE Micro
Power4 System Design for High Reliability
IEEE Micro
A dynamic cache sub-block design to reduce false sharing
ICCD '95 Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Performance, energy, and reliability tradeoffs in replicating hot cache lines
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cache Scrubbing in Microprocessors: Myth or Necessity?
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Enhancing data cache reliability by the addition of a small fully-associative replication cache
Proceedings of the 18th annual international conference on Supercomputing
Computing Architectural Vulnerability Factors for Address-Based Structures
Proceedings of the 32nd annual international symposium on Computer Architecture
In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Reducing Data Cache Susceptibility to Soft Errors
IEEE Transactions on Dependable and Secure Computing
Balancing Performance and Reliability in the Memory Hierarchy
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
SimTag: exploiting tag bits similarity to improve the reliability of the data caches
Proceedings of the Conference on Design, Automation and Test in Europe
A low-cost, systematic methodology for soft error robustness of logic circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.01 |
With the continuous decrease in the minimum feature size and increase in the chip density due to technology scaling, on-chip L2 caches are becoming increasingly susceptible to multi-bit soft errors. The increase in multi-bit errors could lead to higher risk of data corruption and potentially result in the crashing of application programs. Traditionally, the L2 caches have been protected from soft errors using techniques such as: 1) error detection/correction codes; 2) physical interleaving of cache bit lines to convert multi-bit errors into single-bit errors; and 3) cache scrubbing. While the first two methods incur large area overheads for multi-bit errors, identifying the time interval for scrubbing could be tricky. In this paper, we investigate in detail the multi-bit soft error rates in large L2 caches and propose a framework of solutions for their correction based on the amount of redundancy present in the memory hierarchy. We investigate several new techniques for reducing multi-bit errors in large L2 caches, in which, the multi-bit errors are detected using simple error detection codes and corrected using the data redundancy in the memory hierarchy. We also propose several techniques to control/mine the redundancy in the memory hierarchy to further improve the reliability of the L2 cache. The proposed techniques were implemented in the Simplescalar framework and validated using the SPEC 2000 integer and floating point benchmarks for L2 cache vulnerability, global cache miss-rate, average cycle count and main memory write back rate, considering the area and power overheads. Experimental results indicate that the vulnerability of L2 caches can be decreased by 40% on the average for integer benchmarks and 32% on the average for floating point benchmarks, with an average multi-bit error coverage of about 96%, with significantly less area and power overheads and with virtually no performance penalty. The proposed techniques are applicable to both single and multi-core processor-based systems.