Embedded DRAM architectural trade-offs
Proceedings of the conference on Design, automation and test in Europe
Architecture Independent Performance Characterization and Benchmarking for Scientific Applications
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Apex-Map: A Global Data Access Benchmark to Analyze HPC Systems and Parallel Programming Paradigms
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
SystemVerilog for Verification, Second Edition: A Guide to Learning the Testbench Language Features
SystemVerilog for Verification, Second Edition: A Guide to Learning the Testbench Language Features
Embedded DRAM: technology platform for the Blue Gene/L chip
IBM Journal of Research and Development
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Versatile refresh: low complexity refresh scheduling for high-throughput multi-banked eDRAM
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
An energy-efficient and scalable eDRAM-based register file architecture for GPGPU
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Multi-banked embedded DRAM (eDRAM) has become increasingly popular in high-performance systems. However, the data retention problem of eDRAM is exacerbated by the larger number of banks and the high-performance environment in which it is deployed: The data retention time of each memory cell decreases while the number of cells to be refreshed increases. For this, multi-bank designs offer a concurrent refresh mode, where idle banks can be refreshed concurrently during read and write operations. However, conventional techniques such as periodically scheduling refreshes---with priority given to refreshes in case of conflicts with reads or writes---have variable performance, increase read latency, and can perform poorly in worst case memory access patterns. We propose a novel refresh scheduling algorithm that is low-complexity, produces near-optimal throughput with universal guarantees, and is tolerant to bursty memory access patterns. The central idea is to decouple the scheduler into two simple-to-implement modules: one determines which cell to refresh next and the other determines when to force an idle cycle in all banks. We derive necessary and sufficient conditions to guarantee data integrity for all access patterns, with any given number of banks, rows per bank, read/write ports and data retention time. Our analysis shows that there is a tradeoff between refresh overhead and burst tolerance and characterizes this tradeoff precisely. The algorithm is shown to be near-optimal and achieves, for instance, 76.6% reduction in worst-case refresh overhead from the periodic refresh algorithm for a 250MHz eDRAM with 10us retention time and 16 banks each with 128 rows. Simulations with Apex-Map synthetic benchmarks and switch lookup table traffic show that VR can almost completely hide the refresh overhead for memory accesses with moderate-to-high multiplexing across memory banks.