Hybrid cache architecture with disparate memory technologies

Authors:
Xiaoxia Wu;Jian Li;Lixin Zhang;Evan Speight;Ram Rajamony;Yuan Xie
Affiliations:
Pennsylvania State University, University Park, PA, USA;IBM Austin Research Lab, Austin, TX, USA;IBM Austin Research Lab, Austin, TX, USA;IBM Austin Research Lab, Austin, TX, USA;IBM Austin Research Lab, Austin, TX, USA;Pennsylvania State University, University Park, PA, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 18
Cited 50

Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mambo: a full system simulator for the PowerPC architecture

ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Demystifying 3D ICs: The Pros and Cons of Going Vertical

IEEE Design & Test
Bridging the Processor-Memory Performance Gapwith 3D IC Technology

IEEE Design & Test
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory

Proceedings of the 33rd annual international symposium on Computer Architecture
Design space exploration for 3D architectures

ACM Journal on Emerging Technologies in Computing Systems (JETC)
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Die Stacking (3D) Microarchitecture

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement

Proceedings of the 45th annual Design Automation Conference
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
System-level cost analysis and design exploration for three-dimensional integrated circuits (3D ICs)

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Power and performance of read-write aware hybrid caches with non-volatile memories

Proceedings of the Conference on Design, Automation and Test in Europe

Variation-tolerant non-uniform 3D cache management in die stacked multicore processor

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
PCRAMsim: system-level performance, energy, and area modeling for phase-change ram

Proceedings of the 2009 International Conference on Computer-Aided Design
Dynamically replicated memory: building reliable systems from nanoscale resistive memories

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing

Proceedings of the 37th annual international symposium on Computer architecture
Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping

Proceedings of the 37th annual international symposium on Computer architecture
Cost-driven 3D integration with interconnect layers

Proceedings of the 47th Design Automation Conference
An energy efficient cache design using spin torque transfer (STT) RAM

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Energy- and endurance-aware design of phase change memory caches

Proceedings of the Conference on Design, Automation and Test in Europe
A frequent-value based PRAM memory architecture

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Design techniques to improve the device write margin for MRAM-based cache memory

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Moguls: a model to explore the memory hierarchy for bandwidth improvements

Proceedings of the 38th annual international symposium on Computer architecture
High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement

Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Wear rate leveling: lifetime enhancement of PRAM with endurance variation

Proceedings of the 48th Design Automation Conference
A read-write aware replacement policy for phase change memory

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Energy efficient many-core processor for recognition and mining using spin-based memory

NANOARCH '11 Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures
Efficient page caching algorithm with prediction and migration for a hybrid main memory

ACM SIGAPP Applied Computing Review
Bandwidth-aware reconfigurable cache design with hybrid memory technologies

Proceedings of the International Conference on Computer-Aided Design
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
HaVOC: a hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and non-volatile memories

Proceedings of the 49th Annual Design Automation Conference
Future cache design using STT MRAMs for improved energy efficiency: devices, circuits and architecture

Proceedings of the 49th Annual Design Automation Conference
Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors

Proceedings of the 49th Annual Design Automation Conference
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A dual-mode architecture for fast-switching STT-RAM

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A software approach for combating asymmetries of non-volatile memories

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Design of low power 3D hybrid memory by non-volatile CBRAM-crossbar with block-level data-retention

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Static and dynamic co-optimizations for blocks mapping in hybrid caches

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Hybrid nonvolatile disk cache for energy-efficient and high-performance systems

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration

Proceedings of the International Conference on Computer-Aided Design
Write activity reduction on non-volatile main memories for embedded chip multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
Coordinating prefetching and STT-RAM based last-level cache management for multicore systems

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores

Proceedings of the Conference on Design, Automation and Test in Europe
Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes

Proceedings of the Conference on Design, Automation and Test in Europe
D-MRAM cache: enhancing energy efficiency with 3T-1MTJ DRAM/MRAM hybrid memory

Proceedings of the Conference on Design, Automation and Test in Europe
DWM-TAPESTRI - an energy efficient all-spin cache using domain wall shift based writes

Proceedings of the Conference on Design, Automation and Test in Europe
Lighting the dark silicon by exploiting heterogeneity on future processors

Proceedings of the 50th Annual Design Automation Conference
Impact on performance and energy of the retention time and processor frequency in L1 macrocell-based data caches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory

Journal of Systems Architecture: the EUROMICRO Journal
Kiln: closing the performance gap between systems with and without persistence support

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ARI: Adaptive LLC-memory traffic management

ACM Transactions on Architecture and Code Optimization (TACO)
C1C: A configurable, compiler-guided STT-RAM L1 cache

ACM Transactions on Architecture and Code Optimization (TACO)
An efficient run-time encryption scheme for non-volatile main memory

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
AMBER: adaptive energy management for on-chip hybrid video memories

Proceedings of the International Conference on Computer-Aided Design
Unleashing the potential of MLC STT-RAM caches

Proceedings of the International Conference on Computer-Aided Design
System-level impacts of persistent main memory using a search engine

Microelectronics Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caching techniques have been an efficient mechanism for mitigating the effects of the processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core-to-cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power and performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this paper, we propose to take advantage of the best characteristics that each technology offers, through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: inter cache Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intra cache level or cache Region based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intra-cache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 7% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 25 workloads. A more aggressive RHCA-based design provides 12% IPC improvement over the baseline. Finally, a 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 18% IPC improvement over the baseline. Furthermore, up to 70% reduction in power consumption over a baseline SRAM-only design is achieved.