Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

Authors:
Jian Huang;David J. Lilja
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2003

Citing 30
Cited 0

Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP

ICS '98 Proceedings of the 12th international conference on Supercomputing
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Understanding the differences between value prediction and instruction reuse

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Selective value prediction

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Storageless value prediction using prior register values

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Extending Value Reuse to Basic Blocks with Compiler Support

IEEE Transactions on Computers
Measuring computer performance: a practitioner's guide

Measuring computer performance: a practitioner's guide
An architectural alternative to optimizing compilers

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Global Context-Based Value Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Exploiting Basic Block Value Locality with Block Reuse

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Efficacy and Performance Impact of Value Prediction

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Exploring Last n Value Prediction

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Exploring Sub-Block Value Reuse for Superscalar Processors

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
The Dynamic Trace Memorization Reuse Technique

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Trace-Level Reuse

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Improving processor performance through compiler-assisted block reuse

Improving processor performance through compiler-assisted block reuse
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Quantified Score

Hi-index	14.98

Visualization

Abstract

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques, such as value prediction and value reuse, to exploit this behavior. Value prediction improves the available Instruction-Level Parallelism (ILP) in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of their input operands. Value reuse, on the other hand, tries to eliminate redundant computation by storing the previously produced results of instructions and skipping the execution of redundant instructions. Previous value reuse mechanisms use a single instruction or a naturally formed instruction group, such as a basic block, a trace, or a function, as the reuse unit. These naturally-formed instruction groups are readily identifiable by the hardware at runtime without compiler assistance. However, the performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse unit. Since larger instruction groups typically have fewer reuse opportunities than smaller groups, but they provide greater benefit for each reuse-detection process, it is very important to find the balance point that provides the largest overall performance gain. In this paper, we propose a new mechanism called subblock reuse. Subblocks are created by slicing basic blocks either dynamically or with compiler guidance. The dynamic approaches use the number of instructions, numbers of inputs and outputs, or the presence of store instructions to determine the subblock boundaries. The compiler-assisted approach slices basic blocks using data-flow considerations to balance the reuse granularity and the number of reuse opportunities. The results show that subblocks, which can produce up to 36 percent speedup if reused properly, are better candidates for reuse units than basic blocks. Although subblock reuse with compiler assistance has a substantial and consistent potential to improve the performance of superscalar processors, this scheme is not always the best performer. Subblocks restricted to two consecutive instructions demonstrate surprisingly good performance potential as well.