Dynamic Thermal Management through Task Scheduling

  • Authors:
  • Jun Yang;Xiuyi Zhou;Marek Chrobak;Youtao Zhang;Lingling Jin

  • Affiliations:
  • Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh PA 15261;Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh PA 15261;Computer Science, University of California, Riverside, CA 92521;Computer Science, University of Pittsburgh, Pittsburgh PA 15261;Nvidia Corporate, Santa Clara, CA 95050

  • Venue:
  • ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The evolution of microprocessors has been hindered by their increasing power consumption and the heat generation speed on-die. High temperature impairs the processor's reliability and reduces its lifetime. While hardware level dynamic thermal management (DTM) techniques, such as voltage and frequency scaling, can effectively lower the chip temperature when it surpasses the thermal threshold, they inevitably come at the cost of performance degradation. We propose an OS level technique that performs thermal-aware job scheduling to reduce the number of thermal trespasses. Our scheduler reduces the amount of hardware DTMs and achieves higher performance while keeping the temperature low. Our methods leverage the natural discrepancies in thermal behavior among different workloads, and schedule them to keep the chip temperature below a given budget. We develop a heuristic algorithm based on the observation that there is a difference in the resulting temperature when a hot and a cool job are executed in a different order To evaluate our scheduling algorithms, we developed a lightweight runtime temperature monitor to enable informed scheduling decisions. We have implemented our scheduling algorithm and the entire temperature monitoring framework in the Linux kernel. Our proposed scheduler can remove 10.5-73.6% of the hardware DTMs in various combinations of workloads in a medium thermal environment. As a result, the CPU throughput was improved by up to 7.6% (4.1% on average) even under a severe thermal environment.