Microarchitectural Trade-Offs in the Design of a Scalable Clustered Microprocessor

  • Authors:
  • R. Balasubramonian;S. Dwarkadas;D. H. Albonesi

  • Affiliations:
  • -;-;-

  • Venue:
  • Microarchitectural Trade-Offs in the Design of a Scalable Clustered Microprocessor
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustered microarchitectures are an attractive alternative to large monolithic superscalar designs due to their potential for higher clock rates in the face of increasingly wire-delay-constrained process technologies. In such a microarchitecture, the distribution of functional units, the register files, and the issue queues across multiple clusters reduces the latency of various cycle time critical paths, thereby enabling a faster clock. However, a penalty in terms of instructions per cycle is incurred if instructions frequently communicate values among clusters because of dependences. .pp In this paper, we propose several novel extensions that significantly improve the performance of clustered designs. First, we explore a word-interleaved clustered cache in which memory instructions are steered to clusters based on addresses, and when the effective address is unknown, directs memory operations to the appropriate cluster via bank prediction. We then study the scalability of the resulting clustered microarchitecture as the number of clusters is increased (resulting in a corresponding increase in inter-cluster communication latency). Our evaluation identifies the key bottlenecks and shows how novel enhancements to the cluster resource allocation mechanisms can significantly improve the scalability of the design. We also show that communication latency in a highly clustered processor can be reduced for certain programs by only using a subset of the clusters. Overall, these enhancements achieve a 30% fill in the correct value improvement over a baseline design with the clustered cache.