Memory-centric system interconnect design with hybrid memory cubes

Authors:
Gwangsun Kim;John Kim;Jung Ho Ahn;Jaeha Kim
Affiliations:
KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea;Seoul National University, Suwon, Gyeonggi-do, South Korea;Seoul National University, Seoul, South Korea
Venue:
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Year:
2013

Citing 21
Cited 0

Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The BlackWidow High-Radix Clos Network

Proceedings of the 33rd annual international symposium on Computer Architecture
Design tradeoffs for tiled CMP on-chip networks

Proceedings of the 20th annual international conference on Supercomputing
Flattened butterfly: a cost-efficient topology for high-radix networks

Proceedings of the 34th annual international symposium on Computer architecture
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Cost-Efficient Dragonfly Topology for Large-Scale Systems

IEEE Micro
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It

The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It
Intel® QuickPath Interconnect Architectural Features Supporting Scalable System Architectures

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
The Gemini System Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems

Proceedings of the 38th annual international symposium on Computer architecture
Exploring thread and memory placement on NUMA architectures: solaris and linux, UltraSPARC/FirePlane and opteron/hypertransport

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Buffer-on-board memory systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Cray cascade: a scalable HPC system based on a Dragonfly network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
The Oracle Sparc T5 16-Core Processor Scales to Eight Sockets

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory bandwidth has been one of the most critical system performance bottlenecks. As a result, the HMC (Hybrid Memory Cube) has recently been proposed to improve DRAM bandwidth as well as energy efficiency. In this paper, we explore different system interconnect designs with HMCs. We show that processor-centric network architectures cannot fully utilize processor bandwidth across different traffic patterns. Thus, we propose a memory-centric network in which all processor channels are connected to HMCs and not to any other processors as all communication between processors goes through intermediate HMCs. Since there are multiple HMCs per processor, we propose a distributor-based network to reduce the network diameter and achieve lower latency while properly distributing the bandwidth across different routers and providing path diversity. Memory-centric networks lead to some challenges including higher processor-to-processor latency and the need to properly exploit the path diversity. We propose a pass-through microarchitecture, which, in combination with the proper intra-HMC organization, reduces the zero-load latency while exploiting adaptive (and non-minimal) routing to load-balance across different channels. Our results show that memory-centric networks can efficiently utilize processor bandwidth for different traffic patterns and achieve higher performance by providing higher memory bandwidth and lower latency.