RISC I: A Reduced Instruction Set VLSI Computer
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Logic-based eDRAM: origins and rationale for use
IBM Journal of Research and Development - Electrochemical technology in microelectronics
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
POWER4 system microarchitecture
IBM Journal of Research and Development
An hybrid eDRAM/SRAM macrocell to implement first-level data caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
IBM system z10 open systems adapter ethernet data router
IBM Journal of Research and Development
A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
A compile-time managed multi-level register file hierarchy
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
ACM Transactions on Computer Systems (TOCS)
Circuit design of a dual-versioning L1 data cache
Integration, the VLSI Journal
Energy-efficient GPU design with reconfigurable in-package graphics memory
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Boosting mobile GPU performance with a decoupled access/execute fragment processor
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the International Conference on Computer-Aided Design
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
Proceedings of the 40th Annual International Symposium on Computer Architecture
APOGEE: adaptive prefetching on GPUs for energy efficiency
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Exploring hybrid memory for GPU energy efficiency through software-hardware co-design
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Warped gates: gating aware scheduling and power gating for GPGPUs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Large register files are common in highly multi-threaded architectures such as GPUs. This paper presents a hybrid memory design that tightly integrates embedded DRAM into SRAM cells with a main application to reducing area and power consumption of multi-threaded register files. In the hybrid memory, each SRAM cell is augmented with multiple DRAM cells so that multiple bits can be stored in each cell. This configuration results in significant area and energy savings compared to the SRAM array with the same capacity due to compact DRAM cells. On other hand, the hybrid memory requires explicit data movements in order to access DRAM contexts. In order to minimize context switching impact, we introduce write-back buffers, background context switching, and context-aware thread scheduling, to the processor pipeline and the scheduler. Circuit and architecture simulations of GPU benchmarks suites show significant savings in register file area (38%) and energy (68%) over the traditional SRAM implementation, with minimal (1.4%) performance loss.