Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Extending Value Reuse to Basic Blocks with Compiler Support
IEEE Transactions on Computers
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
A study of slipstream processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the potential of tolerant region reuse for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Characterization of value locality in Java programs
Workload characterization of emerging computer applications
On Augmenting Trace Cache for High-Bandwidth Value Prediction
IEEE Transactions on Computers
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
A Compiler Scheme for Reusing Intermediate Computation Results
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Studying Storage-Recomputation Tradeoffs in Memory-Constrained Embedded Processing
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Compiling for memory emergency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Static strands: Safely exposing dependence chains for increasing embedded power efficiency
ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Reducing misspeculation penalty in trace-level speculative multithreaded architectures
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Limits for a feasible speculative trace reuse implementation
International Journal of High Performance Systems Architecture
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
Value prediction at the instruction level has been introduced to allow more aggressive speculation and reuse than previous techniques. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable, suggesting that using compiler support to extend value prediction and reuse to a coarser granularity may have substantial performance benefits. For the SPEC benchmark programs evaluated, 90% of the basic blocks have fewer than 4 register inputs, 5 live register outputs, 4 memory inputs and 2 memory outputs. About 16% to 41% of all the basic blocks are simply repeating earlier calculations when the programs are compiled with the {\it -O2} optimization level in the GCC compiler. We evaluate the potential benefit of basic block reuse using a novel mechanism called a {\it block history buffer}. This mechanism records input and live output values of basic blocks to provide value prediction and reuse at the basic block level. Simulation results show that using a reasonably-sized {\it block history buffer} to provide basic block reuse in a 4-way issue superscalar processor can improve execution time for the tested SPEC programs by 1% to 14% with an overall average of 9%.