Extending Value Reuse to Basic Blocks with Compiler Support

Authors:
Jian Huang;David J. Lilja
Affiliations:
Sun Microsystems, Palo Alto, CA;Univ. of of Minnesota, Minneapolis
Venue:
IEEE Transactions on Computers
Year:
2000

Citing 19
Cited 7

Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
An empirical analysis of instruction repetition

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Superspeculative Microarchitecture for Beyond AD 2000

Computer
Trace Processors: Moving to Fourth-Generation Microarchitectures

Computer
Performance Study of a Concurrent Multithreaded Processor

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Virtual-Physical Registers

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Exploiting Basic Block Value Locality with Block Reuse

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation

Better exploration of region-level value locality with integrated computation reuse and value prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Application-driven processor design exploration for power-performance trade-off analysis

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
A Prolog Tailoring Technique on an Epilog Tailored Procedure

PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
An embedded multi-resolution AMBA trace analyzer for microprocessor-based SoC integration

Proceedings of the 44th annual Design Automation Conference
Limits for a feasible speculative trace reuse implementation

International Journal of High Performance Systems Architecture
Window memoization: an efficient hardware architecture for high-performance image processing

Journal of Real-Time Image Processing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Speculative execution and instruction reuse are two important strategies that have been investigated for improving processor performance. Value prediction at the instruction level has been introduced to allow even more aggressive speculation and reuse than previous techniques. This study suggests that using compiler support to extend value reuse to a coarser granularity than a single instruction, such as a basic block, may have substantial performance benefits. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable. For the SPEC benchmark programs evaluated, 90 percent of the basic blocks have fewer than four register inputs, five live register outputs, four memory inputs, and two memory outputs. About 16 to 41 percent of all the basic blocks are simply repeating earlier calculations when the programs are compiled with the -O2 optimization level in the GCC compiler. Compiler optimizations, such as loop-unrolling and function inlining, affect the sizes of basic blocks, but have no significant or consistent impact on their value locality, nor the resulting performance. Based on these results, we evaluate the potential benefit of basic block reuse using a novel mechanism called the block history buffer. This mechanism records input and live output values of basic blocks to provide value reuse at the basic block level. Simulation results show that using a reasonably sized block history buffer to provide basic block reuse in a 4-way issue superscalar processor can improve execution time for the tested SPEC programs by 1 to 14 percent, with an overall average of 9 percent when using reasonable hardware assumptions.