Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Low power data processing by elimination of redundant computations
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Proceedings of the 24th annual international symposium on Computer architecture
Accelerating multi-media processing by implementing memoing in multiplication and division units
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations
ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Improving Processor Performance by Simplifying and Bypassing Trivial Computations
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
IBM Journal of Research and Development
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
POWER3: the next generation of PowerPC processors
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
Proceedings of the 31st annual international symposium on Computer architecture
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Window memoization: an efficient hardware architecture for high-performance image processing
Journal of Real-Time Image Processing
Hi-index | 0.00 |
Instruction reuse and memoization exploit the fact that during a program run there are operations that execute more than once with the same operand values. By saving previous occurrences of instructions (operands and result) in dedicated, on-chip lookup tables, it is possible to avoid re-execution of these instructions. This has been shown to be efficient in a naive model that assumes single-cycle table lookup. We now extend the analysis to consider the energy, area, and timing overheads of maintaining such tables. We show that reuse opportunities abound in the SPEC CPU2000 benchmark suite, and that by judiciously selecting table configurations it is possible to exploit these opportunities with a minimal penalty. Energy consumption can be further reduced by employing confidence counters, which enable instructions that have a history of failed memoizations to be filtered out. We conclude by identifying those instructions that profit most from memoization, and the conditions under which it is beneficial.