Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Authors:
Xiangyu Dong;Yuan Xie;Naveen Muralimanohar;Norman P. Jouppi
Affiliations:
-;-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 29
Cited 12

Improving IPC by kernel design

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
The Impulse Memory Controller

IEEE Transactions on Computers
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Cost-Effective Main Memory Organization for Future Servers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Spectral prefetcher: An effective mechanism for L2 cache prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy

Proceedings of the 43rd annual Design Automation Conference
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Die Stacking (3D) Microarchitecture

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Processor Design in 3D Die-Stacking Technologies

IEEE Micro
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
IBM POWER6 microarchitecture

IBM Journal of Research and Development
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
PicoServer: Using 3D stacking technology to build energy efficient servers

ACM Journal on Emerging Technologies in Computing Systems (JETC)
COTSon: infrastructure for full system simulation

ACM SIGOPS Operating Systems Review
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Energy-efficient GPU design with reconfigurable in-package graphics memory

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Future memory and interconnect technologies

Proceedings of the Conference on Design, Automation and Test in Europe
Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache

Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches

Proceedings of the 40th Annual International Symposium on Computer Architecture
Software-controlled transparent management of heterogeneous memory resources in virtualized systems

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Kiln: closing the performance gap between systems with and without persistence support

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

System-in-Package (SiP) and 3D integration are promising technologies to bring more memory onto a microprocessor package to mitigate the "memory wall" problem. In this paper, instead of using them to build caches, we study a heterogenous main memory using both on- and off-package memories providing both fast and high-bandwidth on-package accesses and expandable and low-cost commodity off-package memory capacity. We introduce another layer of address translation coupled with an on-chip memory controller that can dynamically migrate data between off-package and off-package memory either in hardware or with operating system assistance depending on the migration granularity. Our experimental results demonstrate that such design can achieve the average effectiveness of 83% of the ideal case where all memory can be placed in high-speed on-package memory for our simulated benchmarks.