Autonomic management of adaptive microarchitectures

Authors:
Ashutosh Sham Dhodapkar;James E. Smith
Affiliations:
The University of Wisconsin - Madison;The University of Wisconsin - Madison
Venue:
Autonomic management of adaptive microarchitectures
Year:
2004

Citing 0
Cited 1

Multi-optimization power management for chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Microarchitectural resource requirements vary across programs and program phases. Adaptive microarchitectures can adjust to changing program requirements to provide better power/performance characteristics. Efficiency of the tuning algorithm that governs the adaptation process is key to achieving benefits from such microarchitectures. We propose a class of generic tuning algorithms that use program phase information to guide the tuning process. These algorithms improve upon previously proposed algorithms by reducing unnecessary tunings and reconfigurations, which are the sources of performance loss associated with tuning. Phase changes are detected dynamically, using a light-weight profiling mechanism called the instruction working set signature. The signature is a lossy-compressed representation of the working set, 32–128B in size. In addition to detecting phase changes, signatures can be used to estimate the working set size and identify recurring phases. We propose three signature based tuning algorithms: the basic tuning algorithm, the signature density based algorithm, and the history based algorithm. Each of these algorithms triggers tuning only when a phase change is detected. The basic algorithm uses trial and error to search for the best configuration. The signature density algorithm directly configures certain units using a working set size estimate based on the signature density. The history algorithm is similar to the basic algorithm, but, reuses configuration information for recurring phases. This reduces the number of trial and error searches and associated performance loss. The best performing algorithms achieve 53%, 30%, 18%, and 48% resource savings for the I-cache, D-cache, L2-cache, and branch predictor, respectively with a performance loss of 1%. An algorithm for managing adaptive microarchitectures with multiple configurable units is proposed. This algorithm uses a novel apportioning technique to decouple the tuning processes of individual units while still meeting a tight performance loss tolerance. This algorithm is achieves 25%, 17%, 9%, and 30% resource savings for the I-cache, D-cache, L2-cache, and branch predictor, respectively with a performance loss of 1.5%. We propose the use of co-designed virtual machine software to implement the tuning algorithms. Based on full system simulation, we conclude that such a software implementation is perfectly viable—leading to less than 0.3% performance loss.