Addressing instruction fetch bottlenecks by using an instruction register file

Authors:
Stephen Roderick Hines;Gary Tyson;David Whalley
Affiliations:
Florida State University, Tallahassee, FL;Florida State University, Tallahassee, FL;Florida State University, Tallahassee, FL
Venue:
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Year:
2007

Citing 30
Cited 1

A portable global optimizer and linker

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving code density using compression techniques

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor

Digital Technical Journal
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Enhanced code compression for embedded RISC processors

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Compiler techniques for code compaction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using dynamic cache management techniques to reduce energy in general purpose processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction flow-based front-end throttling for power-aware high-performance processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Analyzing and compressing assembly code

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors

Proceedings of the 2002 international symposium on Low power electronics and design
DSP Processors Hit the Mainstream

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-Aware Control Speculation through Selective Throttling

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Phase tracking and prediction

Proceedings of the 30th annual international symposium on Computer architecture
Efficient execution of compressed programs

Efficient execution of compressed programs
Tiny instruction caches for low power embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Reducing code size with echo instructions

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
High Efficiency Counter Mode Security Architecture via Prediction and Precomputation

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Program Efficiency by Packing Instructions into Registers

Proceedings of the 32nd annual international symposium on Computer Architecture
Mitigating Amdahl's Law through EPI Throttling

Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Adapting compilation techniques to enhance the packing of instructions into registers

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems

Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.