MAGE: adaptive granularity and ECC for resilient and power efficient memory systems

Authors:
Sheng Li;Doe Hyun Yoon;Ke Chen;Jishen Zhao;Jung Ho Ahn;Jay B. Brockman;Yuan Xie;Norman P. Jouppi
Affiliations:
Hewlett-Packard Labs;Hewlett-Packard Labs;University of Notre Dame and Hewlett-Packard Labs;Pennsylvania State University and Hewlett-Packard Labs;Seoul National University;University of Notre Dame;Pennsylvania State University and AMD Research;Hewlett-Packard Labs
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 20
Cited 2

Decoupled sectored caches: conciliating low tag implementation cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The pool of subsectors cache design

ICS '99 Proceedings of the 13th international conference on Supercomputing
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory Systems: Cache, DRAM, Disk

Memory Systems: Cache, DRAM, Disk
DRAM errors in the wild: a large-scale field study

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A class of optimal minimum odd-weight-column SEC-DED codes

IBM Journal of Research and Development
Structural aspects of the system/360 model 85: II the cache

IBM Systems Journal
Error-correcting codes for semiconductor memory applications: a state-of-the-art review

IBM Journal of Research and Development
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
PCRAMsim: system-level performance, energy, and area modeling for phase-change ram

Proceedings of the 2009 International Conference on Computer-Aided Design
A realistic evaluation of memory hardware errors and software system susceptibility

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture
System implications of memory reliability in exascale computing

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

Proceedings of the International Conference on Computer-Aided Design
Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
The dynamic granularity memory system

Proceedings of the 39th Annual International Symposium on Computer Architecture

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A locality-aware memory hierarchy for energy-efficient GPU architectures

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resiliency is one of the toughest challenges in high-performance computing, and memory accounts for a significant fraction of errors. Providing strong error tolerance in memory usually requires a wide memory channel that incurs a large access granularity (hence, a large cache line). Unfortunately, applications with limited spatial locality waste memory power and bandwidth on systems with a large access granularity. Thus, careful design considerations must be made to balance memory system performance, power efficiency, and resiliency. In this paper, we propose MAGE, a Memory system with Adaptive Granularity and ECC, to achieve high performance, power efficiency, and resiliency. MAGE can adapt memory access granularities and ECC schemes to applications with different memory behaviors. Our experiments show that MAGE achieves more than a 28% energy-delay product improvement, compared to the best existing systems with static granularity and ECC.