Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

  • Authors:
  • Enric Gibert;Jesús Sánchez;Antonio González

  • Affiliations:
  • Department of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona - SPAIN;Intel Barcelona Research Center, Intel Labs - Universitat Politècnica de Catalunya, Barcelona - SPAIN;Department of Computer Architecture, Universitat Politècnica de Catalunya, Barcelona - SPAIN and Intel Barcelona Research Center, Intel Labs - Universitat Politècnica de Catalunya, Barce ...

  • Venue:
  • Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wire delays are a major concern for current and forthcoming processors.One approach to attack this problem is to divide the processorinto semi-independent units referred to as clusters. Acluster usually consists of a local register file and a subset of thefunctional units, while the data cache remains centralized. However,as technology evolves, the latency of such a centralizedcache will increase leading to an important performance impact.In this paper we propose to include flexible low-latency buffers ineach cluster in order to reduce the performance impact of highercache latencies. The reduced number of entries in each buffer permitsthe design of flexible ways to map data from L1 to these buffers.The proposed L0 buffers are managed by the compiler, whichis responsible to decide which memory instructions make use ofthem.Effective instruction scheduling techniques are proposed togenerate code that exploits these buffers. Results for the Media-benchbenchmark suite show that the performance of a clusteredVLIW processor with a unified L1 data cache is improved by 16%when such buffers are used. In addition, the proposed architecturealso shows significant advantages over both MultiVLIW processorsand a clustered processors with a word-interleaved cache,two state-of-the-art designs with a distributed L1 data cache.