A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

  • Authors:
  • Angshuman Parashar;Sudhanva Gurumurthi;Anand Sivasubramaniam

  • Affiliations:
  • The Pennsylvania State University;The Pennsylvania State University;The Pennsylvania State University

  • Venue:
  • Proceedings of the 31st annual international symposium on Computer architecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous proposals for implementing instruction-level temporalredundancy in out-of-order cores have reported a performancedegradation of upto 45% in certain applications compared to anexecution which does not have any temporal redundancy. An importantcontributor to this problem is the insufficient number ofALUs for handling the amplified load injected into the core. At thesame time, increasing the number of ALUs can increase the complexityof the issue logic, which has been pointed out to be oneof the most timing critical components of the processor. This paperproposes a novel extension of a prior idea on instruction reuseto ease ALU bandwidth requirements in a complexity-effective wayby exploiting certain interesting properties of a dual (temporallyredundant) instruction stream. We present microarchitectural extensionsnecessary for implementing an instruction reuse buffer(IRB) and integrating this with the issue logic of a dual instructionstream superscalar core, and conduct extensive evaluationsto demonstrate how well it can alleviate the ALU bandwidth problem.We show that on the average we can gain back nearly 50%of the IPC loss that occurred due to ALU bandwidth limitationsfor an instruction-level temporally redundant superscalar execution,and 23% of the overall IPC loss.