SAFER: Stuck-At-Fault Error Recovery for Memories

Authors:
Nak Hee Seong;Dong Hyuk Woo;Vijayalakshmi Srinivasan;Jude A. Rivers;Hsien-Hsin S. Lee
Affiliations:
-;-;-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 12
Cited 22

Bus-invert coding for low-power I/O

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A performance comparison of contemporary DRAM architectures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
CMOS: Circuit Design, Layout, and Simulation (IEEE Press Series on Microelectronic Systems)

CMOS: Circuit Design, Layout, and Simulation (IEEE Press Series on Microelectronic Systems)
Architecting phase change memory as a scalable dram alternative

Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamically replicated memory: building reliable systems from nanoscale resistive memories

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Use ECP, not ECC, for hard failures in resistive memories

Proceedings of the 37th annual international symposium on Computer architecture
Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping

Proceedings of the 37th annual international symposium on Computer architecture

Enhancing phase change memory lifetime through fine-grained current regulation and voltage upscaling

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Preventing PCM banks from seizing too much power

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Pay-As-You-Go: low-overhead hard-error correction for phase change memories

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies

Proceedings of the 9th conference on Computing Frontiers
Point and discard: a hard-error-tolerant architecture for non-volatile last level caches

Proceedings of the 49th Annual Design Automation Conference
Write performance improvement by hiding R drift latency in phase-change RAM

Proceedings of the 49th Annual Design Automation Conference
Hardware-Assisted Cooperative Integration of Wear-Leveling and Salvaging for Phase Change Memory

ACM Transactions on Architecture and Code Optimization (TACO)
Using managed runtime systems to tolerate holes in wearable memories

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
kMemvisor: flexible system wide memory mirroring in virtual environments

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Characterizing the impact of process variation on write endurance enhancing techniques for non-volatile memory systems

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Phase-change memory: An architectural perspective

ACM Computing Surveys (CSUR)
ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates

Proceedings of the 40th Annual International Symposium on Computer Architecture
Tri-level-cell phase change memory: toward an efficient and reliable memory system

Proceedings of the 40th Annual International Symposium on Computer Architecture
Zombie memory: extending memory lifetime by reviving dead blocks

Proceedings of the 40th Annual International Symposium on Computer Architecture
Optimizing video application design for phase-change RAM-based main memory

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A case study on the application of real phase-change RAM to main memory subsystem

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Practical nonvolatile multilevel-cell phase change memory

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Approximate storage in solid-state memories

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Aegis: partitioning data block for efficient recovery of stuck-at-faults in phase change memory

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories

ACM Transactions on Architecture and Code Optimization (TACO)
NVM duet: unified working memory and persistent store architecture

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Endurance-aware cache line management for non-volatile caches

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

As technology scaling poses a threat to DRAM scaling due to physical limitations such as limited charge, alternative memory technologies including several emerging non-volatile memories are being explored as possible DRAM replacements. One main roadblock for wider adoption of these new memories is the limited write endurance, which leads to wear-out related permanent failures. Furthermore, technology scaling increases the variation in cell lifetime resulting in early failures of many cells. Existing error correcting techniques are primarily devised for recovering from transient faults and are not suitable for recovering from permanent stuck-at faults, which tend to increase gradually with repeated write cycles. In this paper, we propose SAFER, a novel hardware-efficient multi-bit stuck-at fault error recovery scheme for resistive memories, which can function in conjunction with existing wear-leveling techniques. SAFER exploits the key attribute that a failed cell with a stuck-at value is still readable, making it possible to continue to use the failed cell to store data, thereby reducing the hardware overhead for error recovery. SAFER partitions a data block dynamically while ensuring that there is at most one fail bit per partition and uses single error correction techniques per partition for fail recovery. SAFER increases the number of recoverable fails and achieves better lifetime improvement with smaller hardware overhead relative to recently proposed Error Correcting Pointers and even ideal hamming coding scheme.