Using managed runtime systems to tolerate holes in wearable memories

Authors:
Tiejun Gao;Karin Strauss;Stephen M. Blackburn;Kathryn S. McKinley;Doug Burger;James Larus
Affiliations:
Australian National University, Camberra, Australia;Microsoft Research, Redmond, USA;Australian National University, Camberra, Australia;Microsoft Research, Redmond, USA;Microsoft Research, Redmond, USA;Microsoft Research, Redmond, USA
Venue:
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Year:
2013

Citing 19
Cited 0

Combining generational and conservative garbage collection: framework and implementations

POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Controlling fragmentation and space consumption in the metronome, a real-time garbage collector for Java

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Myths and realities: the performance impact of garbage collection

Proceedings of the joint international conference on Measurement and modeling of computer systems
The garbage collection advantage: improving program locality

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The DaCapo benchmarks: java benchmarking development and analysis

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Phase-change random access memory: a scalable technology

IBM Journal of Research and Development
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Better I/O through byte-addressable, persistent memory

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamically replicated memory: building reliable systems from nanoscale resistive memories

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Z-rays: divide arrays and conquer speed and flexibility

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Use ECP, not ECC, for hard failures in resistive memories

Proceedings of the 37th annual international symposium on Computer architecture
SAFER: Stuck-At-Fault Error Recovery for Memories

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
FREE-p: Protecting non-volatile memory against both hard and soft errors

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Why nothing matters: the impact of zeroing

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Pay-As-You-Go: low-overhead hard-error correction for phase change memories

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

New memory technologies, such as phase-change memory (PCM), promise denser and cheaper main memory, and are expected to displace DRAM. However, many of them experience permanent failures far more quickly than DRAM. DRAM mechanisms that handle permanent failures rely on very low failure rates and, if directly applied to PCM, are extremely inefficient: Discarding a page when the first line fails wastes 98% of the memory. This paper proposes low complexity cooperative software and hardware that handle failure rates as high as 50%. Our approach makes error handling transparent to the application by using the memory abstraction offered by managed languages. Once hardware error correction for a memory line is exhausted, rather than discarding the entire page, the hardware communicates the failed line to a failure-aware OS and runtime. The runtime ensures memory allocations never use failed lines and moves data when lines fail during program execution. This paper describes minimal extensions to an Immix mark-region garbage collector, which correctly utilizes pages with failed physical lines by skipping over failures. This paper also proposes hardware support that clusters failed lines at one end of a memory region to reduce fragmentation and improve performance under failures. Contrary to accepted hardware wisdom that advocates for wear-leveling, we show that with software support non-uniform failures delay the impact of memory failure. Together, these mechanisms incur no performance overhead when there are no failures and at failure levels of 10% to 50% suffer only an average overhead of 4% and 12%}, respectively. These results indicate that hardware and software cooperation can greatly extend the life of wearable memories.