Dynamically managing the communication-parallelism trade-off in future clustered processors

  • Authors:
  • Rajeev Balasubramonian;Sandhya Dwarkadas;David H. Albonesi

  • Affiliations:
  • University of Rochester, Rochester, NY;University of Rochester, Rochester, NY;University of Rochester, Rochester, NY

  • Venue:
  • Proceedings of the 30th annual international symposium on Computer architecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustered microarchitectures are an attractive alternative to large monolithic superscalar designs due to their potential for higher clock rates in the face of increasingly wire-delay-constrained process technologies. As increasing transistor counts allow an increase in the number of clusters, thereby allowing more aggressive use of instruction-level parallelism (ILP), the inter-cluster communication increases as data values get spread across a wider area. As a result of the emergence of this trade-off between communication and parallelism, a subset of the total on-chip clusters is optimal for performance. To match the hardware to the application's needs, we use a robust algorithm to dynamically tune the clustered architecture. The algorithm, which is based on program metrics gathered at periodic intervals, achieves an 11% performance improvement on average over the best statically defined architecture. We also show that the use of additional hardware and reconfiguration at basic block boundaries can achieve average improvements of 15%. Our results demonstrate that reconfiguration provides an effective solution to the communication and parallelism trade-off inherent in the communication-bound processors of the future.