Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note)

Authors:
Joshua J. Yi;Resit Sendag;David J. Lilja
Affiliations:
-;-;-
Venue:
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Year:
2002

Citing 3
Cited 5

Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research

Workload characterization of emerging computer applications

A Statistically Rigorous Approach for Improving Simulation Methodology

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor

IEEE Transactions on Computers
Reexecution and Selective Reuse in Checkpoint Processors

Transactions on High-Performance Embedded Architectures and Compilers II
Instruction precomputation with memoization for fault detection

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Value reuse improves a processor's performance by dynamically caching the results of previous instructions and reusing those results to bypass the execution of future instructions that have the same opcode and input operands. However, continually replacing the least recently used entries could eventually fill the value reuse table with instructions that are not frequently executed. Furthermore, the complex hardware that replaces entries and updates the table may necessitate an increase in the clock period. We propose instruction precomputation to address these issues by profiling programs to determine the opcodes and input operands that have the highest frequencies of execution. These instructions then are loaded into the precomputation table before the program executes. During program execution, the precomputation table is used in the same way as the value reuse table is, with the exception that the precomputation table does not dynamically replace any entries. For a 2K-entry pre-computation table implemented on a 4-way issue machine, this approach produced an average speedup of 11.0%. By comparison, a 2K-entry value reuse table produced an average speedup of 6.7%. Instruction precomputation outperforms value reuse, especially for smaller tables, with the same number of table entries while using less area and having a lower access time.