Rethinking DRAM design and organization for energy-constrained multi-cores

Authors:
Aniruddha N. Udipi;Naveen Muralimanohar;Niladrish Chatterjee;Rajeev Balasubramonian;Al Davis;Norman P. Jouppi
Affiliations:
University of Utah, Salt Lake City, UT, USA;Hewlett-Packard Laboratories, Palo Alto, CA, USA;University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA;Hewlett-Packard Laboratories, Palo Alto, CA, USA
Venue:
Proceedings of the 37th annual international symposium on Computer architecture
Year:
2010

Citing 33
Cited 35

Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Concurrency, latency, or system overhead: which has the largest impact on uniprocessor DRAM-system performance?

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Memory controller policies for DRAM power management

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Scheduler-based DRAM energy management

Proceedings of the 39th annual Design Automation Conference
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Energy Management for Commercial Servers

Computer
Dynamic tracking of page miss ratio curve for memory management

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Improving energy efficiency by making DRAM less randomly accessed

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
The Price of Performance

Queue - Multiprocessors
DRAMsim: a memory system simulator

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Design and implementation of power-aware virtual memory

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Corona: System Implications of Emerging Nanophotonic Technology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
PowerNap: eliminating server idle power

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Prefetch-Aware DRAM Controllers

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Memory Systems: Cache, DRAM, Disk

Memory Systems: Cache, DRAM, Disk
DRAM errors in the wild: a large-scale field study

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Disaggregated memory for expansion and sharing in blade servers

Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs

IEEE Computer Architecture Letters
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Future scaling of processor-memory interfaces

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Virtualized and flexible ECC for main memory

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Re-architecting DRAM memory systems with monolithically integrated silicon photonics

Proceedings of the 37th annual international symposium on Computer architecture

Re-architecting DRAM memory systems with monolithically integrated silicon photonics

Proceedings of the 37th annual international symposium on Computer architecture
Understanding the Energy Consumption of Dynamic Random Access Memories

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MemScale: active low-power modes for main memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems

Proceedings of the 38th annual international symposium on Computer architecture
System implications of memory reliability in exascale computing

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Improving System Energy Efficiency with Memory Rank Subsetting

ACM Transactions on Architecture and Code Optimization (TACO)
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities

Proceedings of the 26th ACM international conference on Supercomputing
MultiScale: memory system DVFS with multiple memory controllers

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
BOOM: enabling mobile memory based low-power server DIMMs

Proceedings of the 39th Annual International Symposium on Computer Architecture
Towards energy-proportional datacenter memory with mobile DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
LOT-ECC: localized and tiered reliability mechanisms for commodity memory systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
A software memory partition approach for eliminating bank-level interference in multicore systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
RAMZzz: rank-aware dram power management with dynamic migrations and demotions

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A study of DRAM failures in the field

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A survey of architectural techniques for DRAM power management

International Journal of High Performance Systems Architecture
Asymmetric DRAM synthesis for heterogeneous chip multiprocessors in 3D-stacked architecture

Proceedings of the International Conference on Computer-Aided Design
Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Rethinking DRAM Power Modes for Energy Proportionality

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Conservative row activation to improve memory power efficiency

Proceedings of the 27th international ACM conference on International conference on supercomputing
Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
Resilient die-stacked DRAM caches

Proceedings of the 40th Annual International Symposium on Computer Architecture
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring DRAM organizations for energy-efficient and resilient exascale memories

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Low-power, low-storage-overhead chipkill correct via multi-line error correction

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Coordinate page allocation and thread group for improving main memory power efficiency

Proceedings of the Workshop on Power-Aware Computing and Systems
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A circuit-architecture co-optimization framework for exploring nonvolatile memory hierarchies

ACM Transactions on Architecture and Code Optimization (TACO)
E3CC: A memory error protection scheme with novel address mapping for subranked and low-power memories

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing DRAM row activations with eager read/write clustering

ACM Transactions on Architecture and Code Optimization (TACO)
Unleashing the potential of MLC STT-RAM caches

Proceedings of the International Conference on Computer-Aided Design
Direct distributed memory access for CMPs

Journal of Parallel and Distributed Computing
Refresh pausing in DRAM memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

DRAM vendors have traditionally optimized the cost-per-bit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, where a single request activates thousands of bit-lines in many DRAM chips, only to return a single cache line to the CPU. The focus on cost-per-bit is questionable in modern-day servers where operating costs can easily exceed the purchase cost. Modern technology trends are also placing very different demands on the memory system: (i)queuing delays are a significant component of memory access time, (ii) there is a high energy premium for the level of reliability expected for business-critical computing, and (iii) the memory access stream emerging from multi-core systems exhibits limited locality. All of these trends necessitate an overhaul of DRAM architecture, even if it means a slight compromise in the cost-per-bit metric. This paper examines three primary innovations. The first is a modification to DRAM chip microarchitecture that re tains the traditional DDRx SDRAMinterface. Selective Bit-line Activation (SBA) waits for both RAS (row address) and CAS (column address) signals to arrive before activating exactly those bitlines that provide the requested cache line. SBA reduces energy consumption while incurring slight area and performance penalties. The second innovation, Single Subarray Access (SSA), fundamentally re-organizes the layout of DRAM arrays and the mapping of data to these arrays so that an entire cache line is fetched from a single subarray. It requires a different interface to the memory controller, reduces dynamic and background energy (by about 6X), incurs a slight area penalty (4%), and can even lead to performance improvements (54% on average) by reducing queuing delays. The third innovation further penalizes the cost-per-bit metric by adding a checksum feature to each cache line. This checksum error-detection feature can then be used to build stronger RAID-like fault tolerance, including chipkill-level reliability. Such a technique is especially crucial for the SSA architecture where the entire cache line is localized to a single chip. This DRAM chip microarchitectural change leads to a dramatic reduction in the energy and storage overheads for reliability. The proposed architectures will also apply to other emerging memory technologies (such as resistive memories) and will be less disruptive to standards, interfaces, and the design flow if they can be incorporated into first-generation designs.