Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory mapped ECC: low-cost error protection for last level caches
Proceedings of the 36th annual international symposium on Computer architecture
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Flexible cache error protection using an ECC FIFO
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Improving cache lifetime reliability at ultra-low voltages
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ZerehCache: armoring cache architectures in high defect density technologies
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Shoestring: probabilistic soft error reliability on the cheap
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Reducing cache power with low-cost, multi-bit error-correcting codes
Proceedings of the 37th annual international symposium on Computer architecture
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
Modeling soft errors for data caches and alleviating their effects on data reliability
Microprocessors & Microsystems
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Parichute: Generalized Turbocode-Based Error Correction for Near-Threshold Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A framework for correction of multi-bit soft errors in L2 caches based on redundancy
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
CPPC: correctable parity protected cache
Proceedings of the 38th annual international symposium on Computer architecture
Energy-efficient cache design using variable-strength error-correcting codes
Proceedings of the 38th annual international symposium on Computer architecture
Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache
Proceedings of the 48th Design Automation Conference
FFT-cache: a flexible fault-tolerant cache architecture for ultra low voltage operation
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Unequal-error-protection codes in SRAMs for mobile multimedia applications
Proceedings of the International Conference on Computer-Aided Design
Evaluating application vulnerability to soft errors in multi-level cache hierarchy
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Setting an error detection infrastructure with low cost acoustic wave detectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
A novel NoC-based design for fault-tolerance of last-level caches in CMPs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Breaking the energy barrier in fault-tolerant caches for multicore systems
Proceedings of the Conference on Design, Automation and Test in Europe
Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hierarchical decoding of double error correcting codes for high speed reliable memories
Proceedings of the 50th Annual Design Automation Conference
Product code schemes for error correction in MLC NAND flash memories
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Replicating tag entries for reliability enhancement in cache tag arrays
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A hybrid HW-SW approach for intermittent error mitigation in streaming-based embedded systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Exploring DRAM organizations for energy-efficient and resilient exascale memories
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Low-power, low-storage-overhead chipkill correct via multi-line error correction
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Embedded RAIDs-on-chip for bus-based chip-multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Implicit-storing and redundant-encoding-of-attribute information in error-correction-codes
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Epipe: A low-cost fault-tolerance technique considering WCET constraints
Journal of Systems Architecture: the EUROMICRO Journal
DeSyRe: On-demand system reliability
Microprocessors & Microsystems
A column parity based fault detection mechanism for FIFO buffers
Integration, the VLSI Journal
NoC-based fault-tolerant cache design in chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Supporting faulty banks in NUCA by NoC assisted remapping mechanisms
The Journal of Supercomputing
Hi-index | 0.00 |
In deep sub-micron ICs, growing amounts of on- die memory and scaling effects make embedded memories increasingly vulnerable to reliability and yield problems. As scaling progresses, soft and hard errors in the memory system will increase and single error events are more likely to cause large-scale multi- bit errors. However, conventional memory protection techniques can neither detect nor correct large-scale multi-bit errors without incurring large performance, area, and power overheads. We propose two-dimensional (2D) error coding in embedded memories, a scalable multi-bit error protection technique to improve memory reliability and yield. The key innovation is the use of vertical error coding across words that is used only for error correction in combination with conventional per-word horizontal error coding. We evaluate this scheme in the cache hierarchies of two representative chip multiprocessor designs and show that 2D error coding can correct clustered errors up to 32x32 bits with significantly smaller performance, area, and power overheads than conventional techniques.