Dynamic capacity-speed tradeoffs in SMT processor caches

  • Authors:
  • Sonia López;Steve Dropsho;David H. Albonesi;Oscar Garnica;Juan Lanchares

  • Affiliations:
  • Departamento de Arquitectura de Computadores y Automatica, U. Complutense de Madrid, Spain;School of Computer and Communication Science, EPFL, Switzerland;Computer Systems Laboratory, Cornell University;Departamento de Arquitectura de Computadores y Automatica, U. Complutense de Madrid, Spain;Departamento de Arquitectura de Computadores y Automatica, U. Complutense de Madrid, Spain

  • Venue:
  • HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Caches are designed to provide the best tradeoff between access speed and capacity for a set of target applications. Unfortunately, different applications, and even different phases within the same application, may require a different capacity-speed tradeoff. This problem is exacerbated in a Simultaneous Multi-Threaded (SMT) processor where the optimal cache design may vary drastically with the number of running threads and their characteristics. We propose to make this capacity-speed cache tradeoff dynamic within an SMT core. We extend a previously proposed globally asynchronous, locally synchronous (GALS) processor core with multi-threaded support, and implement dynamically resizable instruction and data caches. As the number of threads and their characteristics change, these adaptive caches automatically adjust from small sizes with fast access times to higher capacity configurations. While the former is more performance-optimal when the core runs a single thread, or a dual-thread workload with modest cache requirements, higher capacity caches work best with most multiple thread workloads. The use of a GALS microarchitecture permits the rest of the processor, namely the execution core, to run at full speed irrespective of the cache speeds. This approach yields an overall performance improvement of 24.7% over the best fixed-size caches for dual-thread workloads, and 19.2% for single-threaded applications.