Power-efficient instruction delivery through trace reuse

Authors:
Chengmo Yang;Alex Orailoglu
Affiliations:
University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA
Venue:
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Year:
2006

Citing 17
Cited 2

The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Instruction fetch energy reduction using loop caches for embedded applications with small tight loops

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A circuit level implementation of an adaptive issue queue for power-aware microprocessors

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Energy-effective issue logic

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Instruction flow-based front-end throttling for power-aware high-performance processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Saving energy with just in time instruction delivery

Proceedings of the 2002 international symposium on Low power electronics and design
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Energy efficient co-adaptive instruction fetch and issue

Proceedings of the 30th annual international symposium on Computer architecture
Reducing instruction fetch energy with backwards branch control information and buffering

Proceedings of the 2003 international symposium on Low power electronics and design
Execution cache-based microarchitecture power-efficient superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Low power microarchitecture with instruction reuse

Proceedings of the 5th conference on Computing frontiers
Reusing cached schedules in an out-of-order processor with in-order issue logic

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design

Quantified Score

Hi-index	0.00

Visualization

Abstract

As power dissipation inexorably becomes the major bottleneck in system integration and reliability, the front-end instruction delivery path in a traditional out-of-order superscalar processor needs to deliver high application performance in an energy-effective manner. This challenge can be addressed by efficiently reusing the work of fetch and decode performed during preceding loop iterations and resident mostly within the processor itself. As a large percentage of the instructions currently under fetch have previously dispatched copies resident in the Reorder Buffer (ROB), in this paper we develop a mechanism to utilize the ROB as a storage location for previously decoded instructions. Thus instructions can be fed directly from the ROB into the rename and issue stages, enabling the gating off of the fetch and decode logic for large periods of time so as to deliver significant power savings. Power and performance criticality of the ROB requires an efficient reuse identification mechanism; we outline such a cost-efficient Reuse Identification Unit (RIU) which enables effective identification of the matches between the ROB entries and the instructions currently under fetch. Simulation results on both multimedia and SPEC 2000 benchmarks confirm that incorporating the proposed technique on traditional out-of-order superscalar processors results in not only a sight improvement in performance, but also significant savings in the overall system power dissipation, achieved within a limited hardware budget.