Way adaptable D-NUCA caches

Authors:
Alessandro Bardine;Manuel Comparetti;Pierfrancesco Foglia;Giacomo Gabrielli;Cosimo Antonio Prete
Affiliations:
Dipartimento di Ingegneria dell;Informazione, Universita di Pisa, Largo Lucio Lazzarino, 56122 Pisa, Italy.;Dipartimento di Ingegneria dell;Informazione, Universita di Pisa, Largo Lucio Lazzarino, 56122 Pisa, Italy.;Dipartimento di Ingegneria dell
Venue:
International Journal of High Performance Systems Architecture
Year:
2010

Citing 24
Cited 2

Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Let caches decay: reducing leakage energy via exploitation of cache generational behavior

ACM Transactions on Computer Systems (TOCS)
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Static energy reduction techniques for microprocessor caches

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Comparing Program Phase Detection Techniques

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Leakage Current: Moore's Law Meets Static Power

Computer
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Controlling leakage power with the replacement policy in slumberous caches

Proceedings of the 2nd conference on Computing frontiers
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Exploring the limits of leakage power reduction in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Locality analysis to control dynamically way-adaptable caches

MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Power reduction techniques for microprocessor systems

ACM Computing Surveys (CSUR)
Digital Design

Digital Design
A cache design for high performance embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems
Analysis of static and dynamic energy consumption in NUCA caches: initial results

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
Leveraging Data Promotion for Low Power D-NUCA Caches

DSD '08 Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools

An efficient routing technique for mesh-of-tree-based NoC and its performance comparison

International Journal of High Performance Systems Architecture
Exploiting replication to improve performances of NUCA-based CMP systems

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Non-uniform cache architecture (NUCA) aims to limit the wire-delay problem typical of large on-chip last level caches: by partitioning a large cache into several banks, with the latency of each one depending on its physical location and by employing a scalable on-chip network to interconnect the banks with the cache controller, the average access latency can be reduced with respect to a traditional cache. The addition of a migration mechanism to move the most frequently accessed data towards the cache controller (D-NUCA) further improves the average access latency. In this work we propose a last-level cache design, based on the D-NUCA scheme, which is able to significantly limit its static power consumption by dynamically adapting to the needs of the running application: the way adaptable D-NUCA cache. This design leads to a fast and power-efficient memory hierarchy with an average reduction by 31.2% in energy-delay product (EDP) with respect to a traditional D-NUCA. We propose and discuss a methodology for tuning the intrinsic parameters of our design and investigate the adoption of the way adaptable D-NUCA scheme as a shared L2 cache in a chip multiprocessor (CMP) system (24% reduction of EDP).