Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Authors:
Nagendra Dwarakanath Gulur;R. Manikantan;Mahesh Mehendale;R. Govindarajan
Affiliations:
Texas Instruments (India) Pvt Ltd, Bangalore, India;Indian Institute of Science, Bangalore, India;Texas Instruments (India) Pvt Ltd, Bangalore, India;Indian Institute of Science, Bangalore, India
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 20
Cited 0

Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Improving energy efficiency by making DRAM less randomly accessed

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Memory Access Scheduling Schemes for Systems with Multi-Core Processors

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
PowerNap: eliminating server idle power

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Architecting phase change memory as a scalable dram alternative

Proceedings of the 36th annual international symposium on Computer architecture
Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs

IEEE Computer Architecture Letters
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Rethinking DRAM design and organization for energy-constrained multi-cores

Proceedings of the 37th annual international symposium on Computer architecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The twin demands of energy-efficiency and higher performance on DRAM are highly emphasized in multicore architectures. A variety of schemes have been proposed to address either the latency or the energy consumption of DRAMs. These schemes typically require non-trivial hardware changes and end up improving latency at the cost of energy or vice-versa. One specific DRAM performance problem in multicores is that interleaved accesses from different cores can potentially degrade row-buffer locality. In this paper, based on the temporal and spatial locality characteristics of memory accesses, we propose a reorganization of the existing single large row-buffer in a DRAM bank into multiple sub-row buffers (MSRB). This re-organization not only improves row hit rates, and hence the average memory latency, but also brings down the energy consumed by the DRAM. The first major contribution of this work is proposing such a reorganization without requiring any significant changes to the existing widely accepted DRAM specifications. Our proposed reorganization improves weighted speedup by 35.8%, 14.5% and 21.6% in quad, eight and sixteen core workloads along with a 42%, 28% and 31% reduction in DRAM energy. The proposed MSRB organization enables opportunities for the management of multiple row-buffers at the memory controller level. As the memory controller is aware of the behaviour of individual cores it allows us to implement coordinated buffer allocation schemes for different cores that take into account program behaviour. We demonstrate two such schemes, namely Fairness Oriented Allocation and Performance Oriented Allocation, which show the flexibility that memory controllers can now exploit in our MSRB organization to improve overall performance and/or fairness. Further, the MSRB organization enables additional opportunities for DRAM intra-bank parallelism and selective early precharging of the LRU row-buffer to further improve memory access latencies. These two optimizations together provide an additional 5.9% performance improvement.