Error-control coding for computer systems
Error-control coding for computer systems
Testing semiconductor memories: theory and practice
Testing semiconductor memories: theory and practice
Area efficient architectures for information integrity in cache memories
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Power4 System Design for High Reliability
IEEE Micro
On Computing Multiplicative Inverses in GF(2/sup m/)
IEEE Transactions on Computers
Testing of Digital Systems
Code Design for Dependable Systems: Theory and Practical Application
Code Design for Dependable Systems: Theory and Practical Application
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory mapped ECC: low-cost error protection for last level caches
Proceedings of the 36th annual international symposium on Computer architecture
Error-correcting codes for semiconductor memory applications: a state-of-the-art review
IBM Journal of Research and Development
Improving cache lifetime reliability at ultra-low voltages
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ZerehCache: armoring cache architectures in high defect density technologies
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low Vccmin fault-tolerant cache with highly predictable performance
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing cache power with low-cost, multi-bit error-correcting codes
Proceedings of the 37th annual international symposium on Computer architecture
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores
MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
Breaking the energy barrier in fault-tolerant caches for multicore systems
Proceedings of the Conference on Design, Automation and Test in Europe
ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates
Proceedings of the 40th Annual International Symposium on Computer Architecture
NoC-based fault-tolerant cache design in chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.01 |
Voltage scaling is one of the most effective mechanisms to improve microprocessors' energy efficiency. However, processors cannot operate reliably below a minimum voltage, Vccmin, since hardware structures may fail. Cell failures in large memory arrays (e.g., caches) typically determine Vccmin for the whole processor. We observe that most cache lines exhibit zero or one failures at low voltages. However, a few lines, especially in large caches, exhibit multi-bit failures and increase Vccmin. Previous solutions either significantly reduce cache capacity to enable uniform error correction across all lines, or significantly increase latency and bandwidth overheads when amortizing the cost of error-correcting codes (ECC) over large lines. In this paper, we propose a novel cache architecture that uses variable-strength error-correcting codes (VS-ECC). In the common case, lines with zero or one failures use a simple and fast ECC. A small number of lines with multi-bit failures use a strong multi-bit ECC that requires some additional area and latency. We present a novel dynamic cache characterization mechanism to determine which lines will exhibit multi-bit failures. In particular, we use multi-bit correction to protect a fraction of the cache after switching to low voltage, while dynamically testing the remaining lines for multi-bit failures. Compared to prior multi-bit-correcting proposals, VS-ECC significantly reduces power and energy, avoids significant reductions in cache capacity, incurs little area overhead, and avoids large increases in latency and bandwidth.