Energy-efficient cache design using variable-strength error-correcting codes

Authors:
Alaa R. Alameldeen;Ilya Wagner;Zeshan Chishti;Wei Wu;Chris Wilkerson;Shih-Lien Lu
Affiliations:
Intel Corporation, Hillsboro, OR, USA;Intel Corporation, Hillsboro, OR, USA;Intel Corporation, Hillsboro, OR, USA;Intel Corporation, Hillsboro, OR, USA;Intel Corporation, Hillsboro, OR, USA;Intel Corporation, Hillsboro, OR, USA
Venue:
Proceedings of the 38th annual international symposium on Computer architecture
Year:
2011

Citing 18
Cited 4

Error-control coding for computer systems

Error-control coding for computer systems
Testing semiconductor memories: theory and practice

Testing semiconductor memories: theory and practice
Area efficient architectures for information integrity in cache memories

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Power4 System Design for High Reliability

IEEE Micro
On Computing Multiplicative Inverses in GF(2/sup m/)

IEEE Transactions on Computers
Testing of Digital Systems

Testing of Digital Systems
Code Design for Dependable Systems: Theory and Practical Application

Code Design for Dependable Systems: Theory and Practical Application
On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology

DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Itanium 2 Processor 6M: Higher Frequency and Larger L3 Cache

IEEE Micro
Memory mapped ECC: low-cost error protection for last level caches

Proceedings of the 36th annual international symposium on Computer architecture
Error-correcting codes for semiconductor memory applications: a state-of-the-art review

IBM Journal of Research and Development
Improving cache lifetime reliability at ultra-low voltages

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ZerehCache: armoring cache architectures in high defect density technologies

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low Vccmin fault-tolerant cache with highly predictable performance

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing cache power with low-cost, multi-bit error-correcting codes

Proceedings of the 37th annual international symposium on Computer architecture
Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Low-Latency Mechanisms for Near-Threshold Operation of Private Caches in Shared Memory Multicores

MICROW '12 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops
Breaking the energy barrier in fault-tolerant caches for multicore systems

Proceedings of the Conference on Design, Automation and Test in Europe
ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates

Proceedings of the 40th Annual International Symposium on Computer Architecture
NoC-based fault-tolerant cache design in chip multiprocessors

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.01

Visualization

Abstract

Voltage scaling is one of the most effective mechanisms to improve microprocessors' energy efficiency. However, processors cannot operate reliably below a minimum voltage, Vccmin, since hardware structures may fail. Cell failures in large memory arrays (e.g., caches) typically determine Vccmin for the whole processor. We observe that most cache lines exhibit zero or one failures at low voltages. However, a few lines, especially in large caches, exhibit multi-bit failures and increase Vccmin. Previous solutions either significantly reduce cache capacity to enable uniform error correction across all lines, or significantly increase latency and bandwidth overheads when amortizing the cost of error-correcting codes (ECC) over large lines. In this paper, we propose a novel cache architecture that uses variable-strength error-correcting codes (VS-ECC). In the common case, lines with zero or one failures use a simple and fast ECC. A small number of lines with multi-bit failures use a strong multi-bit ECC that requires some additional area and latency. We present a novel dynamic cache characterization mechanism to determine which lines will exhibit multi-bit failures. In particular, we use multi-bit correction to protect a fraction of the cache after switching to low voltage, while dynamically testing the remaining lines for multi-bit failures. Compared to prior multi-bit-correcting proposals, VS-ECC significantly reduces power and energy, avoids significant reductions in cache capacity, incurs little area overhead, and avoids large increases in latency and bandwidth.