Exploiting Basic Block Value Locality with Block Reuse

Authors:
Jian Huang;David Lilja
Affiliations:
-;-
Venue:
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Year:
1999

Citing 0
Cited 20

Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Extending Value Reuse to Basic Blocks with Compiler Support

IEEE Transactions on Computers
Slipstream processors: improving both performance and fault tolerance

ACM SIGPLAN Notices
A study of slipstream processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the potential of tolerant region reuse for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Characterization of value locality in Java programs

Workload characterization of emerging computer applications
On Augmenting Trace Cache for High-Bandwidth Value Prediction

IEEE Transactions on Computers
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
A Compiler Scheme for Reusing Intermediate Computation Results

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Studying Storage-Recomputation Tradeoffs in Memory-Constrained Embedded Processing

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Compiling for memory emergency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Static strands: Safely exposing dependence chains for increasing embedded power efficiency

ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
SoftSig: software-exposed hardware signatures for code analysis and optimization

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Instruction Reuse in SPEC, media and packet processing benchmarks: A comparative study of power, performance and related microarchitectural optimizations

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Reducing misspeculation penalty in trace-level speculative multithreaded architectures

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Limits for a feasible speculative trace reuse implementation

International Journal of High Performance Systems Architecture
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

Value prediction at the instruction level has been introduced to allow more aggressive speculation and reuse than previous techniques. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable, suggesting that using compiler support to extend value prediction and reuse to a coarser granularity may have substantial performance benefits. For the SPEC benchmark programs evaluated, 90% of the basic blocks have fewer than 4 register inputs, 5 live register outputs, 4 memory inputs and 2 memory outputs. About 16% to 41% of all the basic blocks are simply repeating earlier calculations when the programs are compiled with the {\it -O2} optimization level in the GCC compiler. We evaluate the potential benefit of basic block reuse using a novel mechanism called a {\it block history buffer}. This mechanism records input and live output values of basic blocks to provide value prediction and reuse at the basic block level. Simulation results show that using a reasonably-sized {\it block history buffer} to provide basic block reuse in a 4-way issue superscalar processor can improve execution time for the tested SPEC programs by 1% to 14% with an overall average of 9%.