Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface

Authors:
Jishen Zhao;Guangyu Sun;Gabriel H. Loh;Yuan Xie
Affiliations:
Pennsylvania State University CSE Department;Peking University;Advanced Micro Devices, Inc. AMD Research;Pennsylvania State University CSE Department
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 23
Cited 0

Temperature-aware microarchitecture: Modeling and implementation

ACM Transactions on Architecture and Code Optimization (TACO)
A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Bridging the Processor-Memory Performance Gapwith 3D IC Technology

IEEE Design & Test
A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy

Proceedings of the 43rd annual Design Automation Conference
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Analysis of dynamic voltage/frequency scaling in chip-multiprocessors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Computer Architecture Techniques for Power-Efficiency

Computer Architecture Techniques for Power-Efficiency
A Predictive Shutdown Technique for GPU Shader Processors

IEEE Computer Architecture Letters
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
3D GPU architecture using cache stacking: performance, cost, power and thermal analysis

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Modeling and optimizing the power performance of large matrices multiplication on multi-core and GPU platform with CUDA

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
MemScale: active low-power modes for main memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Power and Performance Characterization of Computational Kernels on the GPU

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Energy-Efficient Floating-Point Unit Design

IEEE Transactions on Computers
Memory power management via dynamic voltage/frequency scaling

Proceedings of the 8th ACM international conference on Autonomic computing
Energy-efficient mechanisms for managing thread context in throughput processors

Proceedings of the 38th annual international symposium on Computer architecture
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading

Proceedings of the 38th annual international symposium on Computer architecture
Energy-efficient GPU design with reconfigurable in-package graphics memory

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CoScale: Coordinating CPU and Memory System DVFS in Server Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of graphics processing unit (GPU) systems is improving rapidly to accommodate the increasing demands of graphics and high-performance computing applications. With such a performance improvement, however, power consumption of GPU systems is dramatically increased. Up to 30% of the total power of a GPU system is consumed by the graphic memory itself. Therefore, reducing graphics memory power consumption is critical to mitigate the power challenge. In this article, we propose an energy-efficient reconfigurable 3D die-stacking graphics memory design that integrates wide-interface graphics DRAMs side-by-side with a GPU processor on a silicon interposer. The proposed architecture is a “3D+2.5D” system, where the DRAM memory itself is 3D stacked memory with through-silicon via (TSV), whereas the integration of DRAM and the GPU processor is through the interposer solution (2.5D). Since GPU computing units, memory controllers, and memory are all integrated in the same package, the number of memory I/Os is no longer constrained by the package’s pin count. We can reduce the memory power consumption by scaling down the supply voltage and frequency of memory interface while maintaining the same or even higher peak memory bandwidth. In addition, we design a reconfigurable memory interface that can dynamically adapt to the requirements of various applications. We propose two reconfiguration mechanisms to optimize the GPU system energy efficiency and throughput, respectively, and thus benefit both memory-intensive and compute-intensive applications. The experimental results show that the proposed GPU memory architecture can effectively improve GPU system energy efficiency by 21%, without reconfiguration. The reconfigurable memory interface can further improve the system energy efficiency by 26%, and system throughput by 31% under a capped system power budget of 240W.