Tiered Memory: An Iso-Power Memory Architecture to Address the Memory Power Wall

  • Authors:
  • Kshitij Sudan;Karthick Rajamani;Wei Huang;John B. Carter

  • Affiliations:
  • University of Utah, Salt Lake City;IBM Austin Research Lab, Austin;IBM Austin Research Lab, Austin;IBM Austin Research Lab, Austin

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2012

Quantified Score

Hi-index 14.98

Visualization

Abstract

Moore's Law improvement in transistor density is driving a rapid increase in the number of cores per processor. DRAM device capacity and energy efficiency are increasing at a slower pace, so the importance of DRAM power is increasing. This problem presents system designers with two nominal options when designing future systems: 1) decrease off-chip memory capacity and bandwidth per core or 2) increase the fraction of system power allocated to main memory. Reducing capacity and bandwidth leads to imbalanced systems with poor processor utilization for noncache-resident applications, so designers have chosen to increase DRAM power budget. This choice has been viable to date, but is fast running into a memory power wall. To address the looming memory power wall problem, we propose a novel iso-power tiered memory architecture that supports 2-3X more memory capacity for the same power budget as traditional designs by aggressively exploiting low-power DRAM modes. We employ two "tiers” of DRAM, a "hot” tier with active DRAM and a "cold” tier in which DRAM is placed in self-refresh mode. The DRAM capacity of each tier is adjusted dynamically based on aggregate workload requirements and the most frequently accessed data are migrated to the "hot” tier. This design allows larger memory capacities at a fixed power budget while mitigating the performance impact of using low-power DRAM modes. We target our solution at server consolidation scenarios where physical memory capacity is typically the primary factor limiting the number of virtual machines a server can support. Using iso-power tiered memory, we can run 3 {\times} as many virtual machines, achieving a 250 percent improvement in average aggregate performance, compared to a conventional memory design with the same power budget.