Dynamic capacity-speed tradeoffs in SMT processor caches

Authors:
Sonia López;Steve Dropsho;David H. Albonesi;Oscar Garnica;Juan Lanchares
Affiliations:
Departamento de Arquitectura de Computadores y Automatica, U. Complutense de Madrid, Spain;School of Computer and Communication Science, EPFL, Switzerland;Computer Systems Laboratory, Cornell University;Departamento de Arquitectura de Computadores y Automatica, U. Complutense de Madrid, Spain;Departamento de Arquitectura de Computadores y Automatica, U. Complutense de Madrid, Spain
Venue:
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Year:
2007

Citing 15
Cited 7

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Dynamic IPC/clock rate optimization

Proceedings of the 25th annual international symposium on Computer architecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Power and performance evaluation of globally asynchronous locally synchronous processors

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Dynamic frequency and voltage control for a multiple clock domain microarchitecture

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Interfacing Synchronous and Asynchronous Modules Within a High-Speed Pipeline

ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor

Proceedings of the 30th annual international symposium on Computer architecture
Energy-Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamically Trading Frequency for Complexity in a GALS Microprocessor

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A High Performance, Energy Efficient GALS ProcessorMicroarchitecture with Reduced Implementation Complexity

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Stochastic modeling of a thermally-managed multi-core system

Proceedings of the 45th annual Design Automation Conference
Improving SMT performance: an application of genetic algorithms to configure resizable caches

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Integrated CPU cache power management in multiple clock domain processors

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Compiler techniques for reducing data cache miss rate on a multithreaded architecture

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Data layout for cache performance on a multithreaded architecture

Transactions on high-performance embedded architectures and compilers III
Simulating a LAGS processor to consider variable latency on L1 D-Cache

Proceedings of the 2010 Summer Computer Simulation Conference
A phase adaptive cache hierarchy for SMT processors

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caches are designed to provide the best tradeoff between access speed and capacity for a set of target applications. Unfortunately, different applications, and even different phases within the same application, may require a different capacity-speed tradeoff. This problem is exacerbated in a Simultaneous Multi-Threaded (SMT) processor where the optimal cache design may vary drastically with the number of running threads and their characteristics. We propose to make this capacity-speed cache tradeoff dynamic within an SMT core. We extend a previously proposed globally asynchronous, locally synchronous (GALS) processor core with multi-threaded support, and implement dynamically resizable instruction and data caches. As the number of threads and their characteristics change, these adaptive caches automatically adjust from small sizes with fast access times to higher capacity configurations. While the former is more performance-optimal when the core runs a single thread, or a dual-thread workload with modest cache requirements, higher capacity caches work best with most multiple thread workloads. The use of a GALS microarchitecture permits the rest of the processor, namely the execution core, to run at full speed irrespective of the cache speeds. This approach yields an overall performance improvement of 24.7% over the best fixed-size caches for dual-thread workloads, and 19.2% for single-threaded applications.