Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Better I/O through byte-addressable, persistent memory
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamically replicated memory: building reliable systems from nanoscale resistive memories
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Use ECP, not ECC, for hard failures in resistive memories
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
SAFER: Stuck-At-Fault Error Recovery for Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
FREE-p: Protecting non-volatile memory against both hard and soft errors
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Pay-As-You-Go: low-overhead hard-error correction for phase change memories
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
RePRAM: Re-cycling PRAM faulty blocks for extended lifetime
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
NVM duet: unified working memory and persistent store architecture
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
While Phase Change Memory (PCM) holds a great promise as a complement or even replacement of DRAM-based memory and flash-based storage, it must effectively overcome its limit on write endurance to be a reliable device for an extended period of intensive use. The limited write endurance can lead to permanent stuck-at faults after a certain number of writes, which causes some memory cells permanently stuck at either '0' or '1'. State-of-the-art solutions apply a bit inversion technique on selected bit groups of a data block after its partitioning. The effectiveness of this approach hinges on how a data block is partitioned into bit groups. While all existing solutions can separate faults into different groups for error correction, they are inadequate on three fundamental capabilities desired for any partition scheme. First, it can maximize probability of successfully re-partitioning a block so that two faults currently in the same group are placed into two new groups. Second, it can partition a block into a small number of groups for space efficiency. Third, it should spread out faults across the groups as uniformly as possible, so that more faults can be accommodated within the same number of groups. A recovery solution with these capabilities can provide strong fault tolerance with minimal overhead. We propose Aegis, a recovery solution with a systematical partition scheme using fewer groups to accommodate more faults compared with state-of-the-art schemes. The uniqueness of Aegis's partition scheme lies on its guarantee that any two bits in the same group will not be in the same group after a re-partition. Empowered by the partition scheme, Aegis can recover significantly more faults with reduced space overhead relative to state-of-the-art solutions.