Architectural and compiler techniques for energy reduction in high-performance microprocessors

Authors:
Nikolaos Bellas;Ibrahim Hajj;Constantine D. Polychronopoulos;George Stamoulis
Affiliations:
Motorala, Inc., Schaumburg, IL;Univ. of Illinois at Urbana-Champaign, Urbana;Univ. of Illinois at Urbana-Champaign, Urbana;Intel Corp., Santa Clara, CA
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
Year:
2000

Citing 0
Cited 28

A design framework to efficiently explore energy-delay tradeoffs

Proceedings of the ninth international symposium on Hardware/software codesign
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Sentry tag: an efficient filter scheme for low power cache

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Memory Architectures for Embedded Systems-On-Chip

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Compiler optimization on VLIW instruction scheduling for low power

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A system-level methodology for fast multi-objective design space exploration

Proceedings of the 13th ACM Great Lakes symposium on VLSI
Adaptive mode control: A static-power-efficient cache design

ACM Transactions on Embedded Computing Systems (TECS)
A methodology for the efficient architectural exploration of energy-delay trade-offs for embedded systems

Proceedings of the 2003 ACM symposium on Applied computing
Design and analysis of low-power cache using two-level filter scheme

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Micro-operation cache: a power aware frontend for variable instruction length ISA

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Multi-objective co-exploration of source code transformations and design space architectures for low-power embedded systems

Proceedings of the 2004 ACM symposium on Applied computing
Code Placement with Selective Cache Activity Minimization for Embedded Real-time Software Design

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction

Proceedings of the 2004 international symposium on Low power electronics and design
Power-Performance System-Level Exploration of a MicroSPARC2-Based Embedded Architecture

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
A sink-n-hoist framework for leakage power reduction

Proceedings of the 5th ACM international conference on Embedded software
Compilers for leakage power reduction

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Multi-objective design space exploration of embedded systems

Journal of Embedded Computing - Low-power Embedded Systems
Compilation for compact power-gating controls

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Instruction cache energy saving through compiler way-placement

Proceedings of the conference on Design, automation and test in Europe
Branch target buffer design for embedded processors

Microprocessors & Microsystems
Enabling large decoded instruction loop caching for energy-aware embedded processors

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Full Length Article: An on-chip instruction cache design with one-bit tag for low-power embedded systems

Microprocessors & Microsystems
Compiler analysis and supports for leakage power reduction on microprocessors

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Link-time optimization for power efficiency in a tagless instruction cache

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Towards a performance- and energy-efficient data filter cache

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Power devil: tool for power gating strategy selection

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Shrinking l1 instruction caches to improve energy: delay in SMT embedded processors

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
DLIC: Decoded loop instructions caching for energy-aware embedded processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we focus on low-power design techniques for high-performance processors at the architectural and compiler levels. We focus mainly on developing methods for reducing the energy dissipated in the on-chip caches. Energy dissipated in caches represents a substantial portion in the energy budget of today's processors. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. We propose a method that uses an additional minicache located between the I-Cache and the central processing unit (CPU) core and buffers instructions that are nested within loops and are continuously otherwise fetched from the I-Cache. This mechanism is combined with code modifications, through the compiler, that greatly simplify the required hardware, eliminate unnecessary instruction fetching, and consequently reduce signal switching activity and the dissipated energy. We show that the additional cache, dubbed L-Cache, is much smaller and simpler than the I-Cache when the compiler assumes the role of allocating instructions to it. Through simulation, we show that for the SPECfp95 benchmarks, the I-Cache remains disabled most of the time, and the "cheaper" extra cache is used instead. We also propose different techniques that are better adapted to nonnumeric nonloop-intensive code.