The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Mambo: a full system simulator for the PowerPC architecture
ACM SIGMETRICS Performance Evaluation Review - Special issue on tools for computer architecture research
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Demystifying 3D ICs: The Pros and Cons of Going Vertical
IEEE Design & Test
Placement and Routing in 3D Integrated Circuits
IEEE Design & Test
Bridging the Processor-Memory Performance Gapwith 3D IC Technology
IEEE Design & Test
Design and Management of 3D Chip Multiprocessors Using Network-in-Memory
Proceedings of the 33rd annual international symposium on Computer Architecture
Design space exploration for 3D architectures
ACM Journal on Emerging Technologies in Computing Systems (JETC)
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Die Stacking (3D) Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A low-power phase change memory based hybrid cache architecture
Proceedings of the 18th ACM Great Lakes symposium on VLSI
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Proceedings of the 45th annual Design Automation Conference
Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model
IEEE Transactions on Computers
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
System-level cost analysis and design exploration for three-dimensional integrated circuits (3D ICs)
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Architecting phase change memory as a scalable dram alternative
Proceedings of the 36th annual international symposium on Computer architecture
A durable and energy efficient main memory using phase change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Scalable high performance main memory system using phase-change memory technology
Proceedings of the 36th annual international symposium on Computer architecture
Power and performance of read-write aware hybrid caches with non-volatile memories
Proceedings of the Conference on Design, Automation and Test in Europe
TapeCache: a high density, energy efficient cache based on domain wall memory
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A compression-based area-efficient recovery architecture for nonvolatile processors
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Traditional multilevel SRAM-based cache hierarchies, especially in the context of chip multiprocessors (CMPs), present many challenges in area requirements, core--to--cache balance, power consumption, and design complexity. New advancements in technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), and Phase-change RAM (PRAM), in both 2D chips or 3D stacked chips. Caches fabricated in these technologies offer dramatically different power-performance characteristics when compared with SRAM-based caches, particularly in the areas of access latency, cell density, and overall power consumption. In this article, we propose to take advantage of the best characteristics that each technology has to offer through the use of Hybrid Cache Architecture (HCA) designs. We discuss and evaluate two types of hybrid cache architectures: intercache-Level HCA (LHCA), in which the levels in a cache hierarchy can be made of disparate memory technologies; and intracache-level or cache-Region-based HCA (RHCA), where a single level of cache can be partitioned into multiple regions, each of a different memory technology. We have studied a number of different HCA architectures and explored the potential of hardware support for intracache data movement and power consumption management within HCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that an LHCA design can provide a geometric mean 6% IPC improvement over a baseline 3-level SRAM cache design under the same area constraint across a collection of 30 workloads. A more aggressive RHCA-based design provides 10% IPC improvement over the baseline. A 2-layer 3D cache stack (3DHCA) of high density memory technology within the same chip footprint gives 16% IPC improvement over the baseline. We also achieve up to a 72% reduction in power consumption over a baseline SRAM-only design. Energy-delay and thermal evaluation for 3DHCA are also presented. In addition to the fast-slow region based RHCA, we further evaluate read-write region based RHCA designs.