Exploring hybrid memory for GPU energy efficiency through software-hardware co-design

Authors:
Bin Wang;Bo Wu;Dong Li;Xipeng Shen;Weikuan Yu;Yizheng Jiao;Jeffrey S. Vetter
Affiliations:
Auburn University, Auburn, AL, USA;College of William & Mary, Williamsburg, VA, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;College of William & Mary, Williamsburg, VA, USA;Auburn University, Auburn, AL, USA;Auburn University, Auburn, AL, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA
Venue:
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Year:
2013

Citing 25
Cited 0

Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Power-performance considerations of parallel computing on chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
A compiler framework for optimization of affine loop nests for gpgpus

Proceedings of the 22nd annual international conference on Supercomputing
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Architecting phase change memory as a scalable dram alternative

Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
PDRAM: a hybrid PRAM and DRAM main memory system

Proceedings of the 46th Annual Design Automation Conference
Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An integrated GPU power and performance model

Proceedings of the 37th annual international symposium on Computer architecture
Operating system support for NVM+DRAM hybrid main memory

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Sponge: portable stream programming on graphics engines

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
DRAMSim2: A Cycle Accurate Memory System Simulator

IEEE Computer Architecture Letters
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

Proceedings of the 38th annual international symposium on Computer architecture
Power management of hybrid DRAM/PRAM-based main memory

Proceedings of the 48th Design Automation Conference
Energy-aware writes to non-volatile main memory

HotPower '11 Proceedings of the 4th Workshop on Power-Aware Computing and Systems
Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
PCM-Based Durable Write Cache for Fast Disk I/O

MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Performance and Power Simulation for Versatile GPGPU Global Memory

IPDPSW '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hybrid memory designs, such as DRAM plus Phase Change Memory (PCM), have shown some promise for alleviating power and density issues faced by traditional memory systems. But previous studies have concentrated on CPU systems with a modest level of parallelism. This work studies the problem in a massively parallel setting. Specifically, it investigates the special implications to hybrid memory imposed by the massive parallelism in GPU. It empirically shows that, contrary to promising results demonstrated for CPU, previous designs of PCM-based hybrid memory result in significant degradation to the energy efficiency of GPU. It reveals that the fundamental reason comes from a multi-facet mismatch between those designs and the massive parallelism in GPU. It presents a solution that centers around a close cooperation between compiler-directed data placement and hardware-assisted runtime adaptation. The co-design approach helps tap into the full potential of hybrid memory for GPU without requiring dramatic hardware changes over previous designs, yielding 6% and 49% energy saving on average compared to pure DRAM and pure PCM respectively, and keeping performance loss less than 2%.