Removing communications in clustered microarchitectures through instruction replication

  • Authors:
  • Alex Aletà;Josep M. Codina;Antonio González;David Kaeli

  • Affiliations:
  • UPC, Barcelona, Spain;UPC, Barcelona, Spain;Intel Labs Barcelona, UPC, Barcelona, Spain;Northeastern University, Boston, MA

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The need to communicate values between clusters can result in a significant performance loss for clustered microarchitectures. In this work, we describe an optimization technique that removes communications by selectively replicating an appropriate set of instructions. Instruction replication is done carefully because it might degrade performance due to the increased contention it can place on processor resources. The proposed scheme is built on top of a previously proposed state-of-the-art modulo-scheduling algorithm. Though this algorithm has been proved to be very effective at reducing communications, results show that the number of communications can be further decreased by around one-third through replication, which results in a significant speedup. IPC is increased by 25% on average for a four-cluster microarchitecture and by as much as 70% for selected programs. We also show that replicating appropriate sets of instructions is more effective than doubling the intercluster connection network bandwidth.