Thermal prediction and adaptive control through workload phase detection

  • Authors:
  • Ryan Cochran;Sherief Reda

  • Affiliations:
  • Brown University;Brown University

  • Venue:
  • ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Elevated die temperature is a true limiter to the scalability of modern processors. With continued technology scaling in order to meet ever-increasing performance demands, it is no longer cost effective to design cooling systems that handle the worst-case thermal behaviors. Instead, cooling systems are designed to handle typical chip operation, while processors must detect and handle rare thermal emergencies. Most processors rely on measurements from integrated thermal sensors and dynamic thermal management (DTM) techniques in order to manage the trade-off between performance and thermal risk. Optimal management requires advanced knowledge of the thermal trajectory based on the current workload behaviors and operating conditions. In this work, we devise novel workload phase classification strategies that automatically discriminate among workload behaviors with respect to the thermal control response. We incorporate workload phase-detection and thermal models into a dynamic voltage and frequency scaling (DVFS) technique that can optimally control temperature during runtime based on thermal predictions. We demonstrate the effectiveness of our proposed techniques in predicting and adaptively controlling the thermal behavior of a real quad-core processor in response to a wide range of workloads. In comparison with state-of-the-art model predictive control (MPC) techniques in previous works on thermal prediction, we demonstrate a 5.8% improvement in instruction throughput with the same number of thermal violations. In comparison with simple proportional-integral (PI) feedback control techniques, we improve instruction throughput by 3.9%, while significantly reducing the number of thermal violations.