System-Level Dynamic Thermal Management for High-Performance Microprocessors

  • Authors:
  • A. Kumar;Li Shang;Li-Shiuan Peh;N. K. Jha

  • Affiliations:
  • Princeton Univ., Princeton;-;-;-

  • Venue:
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

Thermal issues are fast becoming major design constraints in high-performance systems. Temperature variations adversely affect system reliability and prompt worst-case design. In recent history, researchers have proposed dynamic thermal-management (DTM) techniques targeting average-case design and tackling the temperature issue at runtime. While past work on DTM has focused on different techniques in isolation, it fails to consider a system-level approach which uses both hardware and software support in a synergistic fashion and hence leads to a significant execution-time overhead. In this paper, we propose HybDTM, a system-level framework for doing fine-grained coordinated thermal management using a hybrid of hardware techniques (like clock gating) and software techniques (like thermal-aware process scheduling), leveraging the advantages of both approaches in a synergistic fashion. We show that while hardware techniques can be used reactively to manage the overall temperature in case of thermal emergencies, proactive use of software techniques can build on top of it to balance the overall thermal profile with minimal overhead using the operating system (OS) support. In order to evaluate our proposed hybrid-DTM policy, we develop a novel regression-based thermal model, providing fast and accurate temperature estimates to do runtime thermal characterization of all applications running on the system, using hardware performance counters available in modern high-performance processors alongside thermal sensors for training the model at runtime. Our model is validated against actual temperature measurements from online thermal sensors, with the average estimation error found to be less than 5%. We also study system-level DTM issues, jointly considering both the processor and memory, and show how a unified DTM approach can benefit from global knowledge of individual system components. We evaluate our proposed methodology on a desktop system with an Intel Pentium-4 process- - or and a modified Linux OS, running a number of SPEC2000 benchmarks, in both uniprocessor and simultaneous multithreaded environments and show that our proposed technique is able to successfully manage the overall temperature with an average execution-time overhead of only 10.4% (20.1% maximum) compared to the case without any DTM, as opposed to 23.9% (46% maximum) overhead for purely hardware-based DTM. Our system, including the thermal-aware OS, built-in runtime thermal-characterization model, and interface to the underlying hardware using the Pentium-4 processor, is ready for release.