Link-time optimization for power efficiency in a tagless instruction cache

Authors:
Timothy M. Jones;Sandro Bartolini;Jonas Maebe;Dominique Chanet
Affiliations:
School of Informatics, University of Edinburgh, United Kingdom, Sandro Bartolini, Faculty of Engineering, University of Siena, Italy;Faculty of Engineering, University of Siena, Italy;ELIS Department, Ghent University, Belgium, Dominique Chanet, Gateway Architecture Group, Technicolor, Belgium;Gateway Architecture Group, Technicolor, Belgium
Venue:
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2011

Citing 36
Cited 0

Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler techniques for energy reduction in high-performance microprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
SH3: High Code Density, Low Power

IEEE Micro
Using the Compiler to Improve Cache Replacement Decisions

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Energy efficient frequent value data cache design

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Compiler-directed instruction cache leakage optimization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A compiler approach for reducing data cache energy

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Compiler-Directed Management of Instruction Accesses

DSD '03 Proceedings of the Euromicro Symposium on Digital Systems Design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction

Proceedings of the 2004 international symposium on Low power electronics and design
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

Proceedings of the international symposium on Code generation and optimization
Cooperative Caching with Keep-Me and Evict-Me

INTERACT '05 Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures
Compilation techniques for energy reduction in horizontally partitioned cache architectures

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Memory allocation for embedded systems with a compile-time-unknown scratch-pad size

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Link-time binary rewriting techniques for program compaction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing the Thermal Behavior of Subarrayed Data Caches

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Thermal Management of On-Chip Caches Through Power Density Minimization

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Compiler-Controlled Instruction Cache Architecture for an Embedded Low Power Microprocessor

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Heterogeneous way-size cache

Proceedings of the 20th annual international conference on Supercomputing
Power-efficient prefetching for embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Improving power efficiency with compiler-assisted cache replacement

Journal of Embedded Computing - Cache exploitation in embedded systems
Dynamic scratchpad memory management for code in portable systems with an MMU

ACM Transactions on Embedded Computing Systems (TECS)
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient Embedded Computing

Computer
Reconfigurable energy efficient near threshold cache architectures

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The instruction cache is a critical component in any microprocessor. It must have high performance to enable fetching of instructions on every cycle. However, current designs waste a large amount of energy on each access as tags and data banks from all cache ways are consulted in parallel to fetch the correct instructions as quickly as possible. Existing approaches to reduce this overhead remove unnecessary accesses to the data banks or to the ways that are not likely to hit. However, tag hunks still need to be checked. This paper considers a new hybrid hardware and linker-assisted approach to tagless instruction caching. Our novel cache architecture, supported by the compilation toolchain, removes the need for tag checks entirely for the majority of cache accesses. The linker places frequently-executed instructions in specific program regions that are then mapped into the cache without the need for tag checks. This requires minor hardware modifications, no ISA changes and works across cache configurations. Our approach keeps the software and hardware independent, resulting in both backward and forward compatibility. evaluation on a superscalar processor with and without SMI' support shows power savings of 66% within the instruction cache with no loss of performance. This translates to a 49% saving when considering the combined power of the instruction cache and translation lookaside buffer, which is involved in managing our tagless scheme.