Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mambo: a full system simulator for the PowerPC architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Demystifying 3D ICs: The Pros and Cons of Going Vertical
IEEE Design & Test
Bridging the Processor-Memory Performance Gapwith 3D IC Technology
IEEE Design & Test
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 45th annual Design Automation Conference
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
System-level cost analysis and design exploration for three-dimensional integrated circuits (3D ICs)
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Power and performance of read-write aware hybrid caches with non-volatile memories
Proceedings of the Conference on Design, Automation and Test in Europe
Variation-tolerant non-uniform 3D cache management in die stacked multicore processor
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
PCRAMsim: system-level performance, energy, and area modeling for phase-change ram
Proceedings of the 2009 International Conference on Computer-Aided Design
Dynamically replicated memory: building reliable systems from nanoscale resistive memories
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 37th annual international symposium on Computer architecture
Cost-driven 3D integration with interconnect layers
Proceedings of the 47th Design Automation Conference
An energy efficient cache design using spin torque transfer (STT) RAM
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Energy- and endurance-aware design of phase change memory caches
Proceedings of the Conference on Design, Automation and Test in Europe
A frequent-value based PRAM memory architecture
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Design techniques to improve the device write margin for MRAM-based cache memory
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Moguls: a model to explore the memory hierarchy for bandwidth improvements
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Wear rate leveling: lifetime enhancement of PRAM with endurance variation
Proceedings of the 48th Design Automation Conference
A read-write aware replacement policy for phase change memory
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Energy efficient many-core processor for recognition and mining using spin-based memory
NANOARCH '11 Proceedings of the 2011 IEEE/ACM International Symposium on Nanoscale Architectures
Efficient page caching algorithm with prediction and migration for a hybrid main memory
ACM SIGAPP Applied Computing Review
Bandwidth-aware reconfigurable cache design with hybrid memory technologies
Proceedings of the International Conference on Computer-Aided Design
Efficient memory management of a hierarchical and a hybrid main memory for MN-MATE platform
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 49th Annual Design Automation Conference
Proceedings of the 49th Annual Design Automation Conference
Constructing large and fast multi-level cell STT-MRAM based cache for embedded processors
Proceedings of the 49th Annual Design Automation Conference
Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A dual-mode architecture for fast-switching STT-RAM
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A software approach for combating asymmetries of non-volatile memories
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Design of low power 3D hybrid memory by non-volatile CBRAM-crossbar with block-level data-retention
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Static and dynamic co-optimizations for blocks mapping in hybrid caches
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
MAC: migration-aware compilation for STT-RAM based hybrid cache in embedded systems
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Hybrid nonvolatile disk cache for energy-efficient and high-performance systems
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Proceedings of the International Conference on Computer-Aided Design
Write activity reduction on non-volatile main memories for embedded chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
Coordinating prefetching and STT-RAM based last-level cache management for multicore systems
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores
Proceedings of the Conference on Design, Automation and Test in Europe
Combining RAM technologies for hard-error recovery in L1 data caches working at very-low power modes
Proceedings of the Conference on Design, Automation and Test in Europe
D-MRAM cache: enhancing energy efficiency with 3T-1MTJ DRAM/MRAM hybrid memory
Proceedings of the Conference on Design, Automation and Test in Europe
DWM-TAPESTRI - an energy efficient all-spin cache using domain wall shift based writes
Proceedings of the Conference on Design, Automation and Test in Europe
Lighting the dark silicon by exploiting heterogeneity on future processors
Proceedings of the 50th Annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Minimizing accumulative memory load cost on multi-core DSPs with multi-level memory
Journal of Systems Architecture: the EUROMICRO Journal
Kiln: closing the performance gap between systems with and without persistence support
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
ARI: Adaptive LLC-memory traffic management
ACM Transactions on Architecture and Code Optimization (TACO)
C1C: A configurable, compiler-guided STT-RAM L1 cache
ACM Transactions on Architecture and Code Optimization (TACO)
An efficient run-time encryption scheme for non-volatile main memory
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
AMBER: adaptive energy management for on-chip hybrid video memories
Proceedings of the International Conference on Computer-Aided Design
Unleashing the potential of MLC STT-RAM caches
Proceedings of the International Conference on Computer-Aided Design
System-level impacts of persistent main memory using a search engine
Microelectronics Journal
Hi-index | 0.00 |
Caching techniques have been an efficient mechanism for mitigating the effects of the processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core-to-cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power and performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this paper, we propose to take advantage of the best characteristics that each technology offers, through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: inter cache Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intra cache level or cache Region based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intra-cache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 7% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 25 workloads. A more aggressive RHCA-based design provides 12% IPC improvement over the baseline. Finally, a 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 18% IPC improvement over the baseline. Furthermore, up to 70% reduction in power consumption over a baseline SRAM-only design is achieved.