Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM

Authors:
Mohammad Alizadeh;Adel Javanmard;Shang-Tse Chuang;Sundar Iyer;Yi Lu
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Memoir Systems, Santa Clara, CA, USA;Memoir Systems, Santa Clara, CA, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Year:
2012

Citing 14
Cited 2

Embedded DRAM architectural trade-offs

Proceedings of the conference on Design, automation and test in Europe
Embedded DRAM Development: Technology, Physical Design, and Application Issues

IEEE Design & Test
Architecture Independent Performance Characterization and Benchmarking for Scientific Applications

MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Apex-Map: A Global Data Access Benchmark to Analyze HPC Systems and Parallel Programming Paradigms

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
SystemVerilog for Verification, Second Edition: A Guide to Learning the Testbench Language Features

SystemVerilog for Verification, Second Edition: A Guide to Learning the Testbench Language Features
Embedded DRAM: technology platform for the Blue Gene/L chip

IBM Journal of Research and Development
ESKIMO: Energy savings using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM subsystem

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems

Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-banked embedded DRAM (eDRAM) has become increasingly popular in high-performance systems. However, the data retention problem of eDRAM is exacerbated by the larger number of banks and the high-performance environment in which it is deployed: The data retention time of each memory cell decreases while the number of cells to be refreshed increases. For this, multi-bank designs offer a concurrent refresh mode, where idle banks can be refreshed concurrently during read and write operations. However, conventional techniques such as periodically scheduling refreshes---with priority given to refreshes in case of conflicts with reads or writes---have variable performance, increase read latency, and can perform poorly in worst case memory access patterns. We propose a novel refresh scheduling algorithm that is low-complexity, produces near-optimal throughput with universal guarantees, and is tolerant to bursty memory access patterns. The central idea is to decouple the scheduler into two simple-to-implement modules: one determines which cell to refresh next and the other determines when to force an idle cycle in all banks. We derive necessary and sufficient conditions to guarantee data integrity for all access patterns, with any given number of banks, rows per bank, read/write ports and data retention time. Our analysis shows that there is a tradeoff between refresh overhead and burst tolerance and characterizes this tradeoff precisely. The algorithm is shown to be near-optimal and achieves, for instance, 76.6% reduction in worst-case refresh overhead from the periodic refresh algorithm for a 250MHz eDRAM with 10us retention time and 16 banks each with 128 rows. Simulations with Apex-Map synthetic benchmarks and switch lookup table traffic show that VR can almost completely hide the refresh overhead for memory accesses with moderate-to-high multiplexing across memory banks.