Design exploration of hybrid caches with disparate memory technologies

  • Authors:
  • Xiaoxia Wu;Jian Li;Lixin Zhang;Evan Speight;Ram Rajamony;Yuan Xie

  • Affiliations:
  • The Pennsylvania State University, Park, PA;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;IBM Austin Research Laboratory, Austin, TX;The Pennsylvania State University, Park, PA

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional multilevel SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core--to--cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power-performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this article, we propose to take advantage of the best characteristics that each technology has to offer through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: intercache-Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intracache-level or cache-Region-based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intracache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 6% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 30 workloads. A more aggressive RHCA-based design provides 10% IPC improvement over the baseline. A 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 16% IPC improvement over the baseline. We also achieve up to a 72% reduction in power consumption over a baseline SRAM-only design. Energy-delay and thermal evaluation for 3DHCA are also presented. In addition to the fast-slow region based RHCA, we further evaluate read-write region based RHCA designs.