Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note)

  • Authors:
  • Joshua J. Yi;Resit Sendag;David J. Lilja

  • Affiliations:
  • -;-;-

  • Venue:
  • Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Value reuse improves a processor's performance by dynamically caching the results of previous instructions and reusing those results to bypass the execution of future instructions that have the same opcode and input operands. However, continually replacing the least recently used entries could eventually fill the value reuse table with instructions that are not frequently executed. Furthermore, the complex hardware that replaces entries and updates the table may necessitate an increase in the clock period. We propose instruction precomputation to address these issues by profiling programs to determine the opcodes and input operands that have the highest frequencies of execution. These instructions then are loaded into the precomputation table before the program executes. During program execution, the precomputation table is used in the same way as the value reuse table is, with the exception that the precomputation table does not dynamically replace any entries. For a 2K-entry pre-computation table implemented on a 4-way issue machine, this approach produced an average speedup of 11.0%. By comparison, a 2K-entry value reuse table produced an average speedup of 6.7%. Instruction precomputation outperforms value reuse, especially for smaller tables, with the same number of table entries while using less area and having a lower access time.