“Look it up" or "do the math": an energy, area, and timing analysis of instruction reuse and memoization

Authors:
Daniel Citron;Dror G. Feitelson
Affiliations:
IBM Haifa Labs, Haifa University Campus, Haifa, Israel;School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
Venue:
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Year:
2003

Citing 13
Cited 3

Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Low power data processing by elimination of redundant computations

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Environment for PowerPC Microarchitecture Exploration

IEEE Micro
Improving Processor Performance by Simplifying and Bypassing Trivial Computations

ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors

IBM Journal of Research and Development
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
An analysis of the amount of global level redundant computation in the SPEC 95 and SPEC 2000 benchmarks

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
POWER3: the next generation of PowerPC processors

IBM Journal of Research and Development
POWER4 system microarchitecture

IBM Journal of Research and Development

A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Window memoization: an efficient hardware architecture for high-performance image processing

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction reuse and memoization exploit the fact that during a program run there are operations that execute more than once with the same operand values. By saving previous occurrences of instructions (operands and result) in dedicated, on-chip lookup tables, it is possible to avoid re-execution of these instructions. This has been shown to be efficient in a naive model that assumes single-cycle table lookup. We now extend the analysis to consider the energy, area, and timing overheads of maintaining such tables. We show that reuse opportunities abound in the SPEC CPU2000 benchmark suite, and that by judiciously selecting table configurations it is possible to exploit these opportunities with a minimal penalty. Energy consumption can be further reduced by employing confidence counters, which enable instructions that have a history of failed memoizations to be filtered out. We conclude by identifying those instructions that profit most from memoization, and the conditions under which it is beneficial.