Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

  • Authors:
  • Mohamed Gomaa;Michael D. Powell;T. N. Vijaykumar

  • Affiliations:
  • Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN

  • Venue:
  • ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Power density in high-performance processors continues to increase with technology generations as scaling of current, clock speed, and device density outpaces the downscaling of supply voltage and thermal ability of packages to dissipate heat. Power density is characterized by localized chip hot spots that can reach critical temperatures and cause failure. Previous architectural approaches to power density have used global clock gating, fetch toggling, dynamic frequency scaling, or resource duplication to either prevent heating or relieve overheated resources in a superscalar processor. Previous approaches also evaluate design technologies where power density is not a major problem and most applications do not overheat the processor. Future processors, however, are likely to be chip multiprocessors (CMPs) with simultaneously-multithreaded (SMT) cores. SMT CMPs pose unique challenges and opportunities for power density. SMT and CMP increase throughput and thus on-chip heat, but also provide natural granularities for managing power-density. This paper is the first work to leverage SMT and CMP to address power density. We propose heat-and-run SMT thread assignment to increase processor-resource utilization before cooling becomes necessary by co-scheduling threads that use complimentary resources. We propose heat-and-run CMP thread migration to migrate threads away from overheated cores and assign them to free SMT contexts on alternate cores, leveraging availability of SMT contexts on alternate CMP cores to maintain throughput while allowing overheated cores to cool. We show that our proposal has an average of 9% and up to 34% higher throughput than a previous superscalar technique running the same number of threads.