A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving code density using compression techniques
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Enhanced code compression for embedded RISC processors
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Compiler techniques for code compaction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Using dynamic cache management techniques to reduce energy in general purpose processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Instruction flow-based front-end throttling for power-aware high-performance processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Analyzing and compressing assembly code
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Asymmetric-frequency clustering: a power-aware back-end for high-performance processors
Proceedings of the 2002 international symposium on Low power electronics and design
DSP Processors Hit the Mainstream
Computer
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Proceedings of the 30th annual international symposium on Computer architecture
Efficient execution of compressed programs
Efficient execution of compressed programs
Tiny instruction caches for low power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Reducing code size with echo instructions
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
High Efficiency Counter Mode Security Architecture via Prediction and Precomputation
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Program Efficiency by Packing Instructions into Registers
Proceedings of the 32nd annual international symposium on Computer Architecture
Mitigating Amdahl's Law through EPI Throttling
Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Adapting compilation techniques to enhance the packing of instructions into registers
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.