Fighting fire with fire: modeling the datacenter-scale effects of targeted superlattice thermal management

  • Authors:
  • Susmit Biswas;Mohit Tiwari;Timothy Sherwood;Luke Theogarajan;Frederic T. Chong

  • Affiliations:
  • Lawrence Livermore National Laboratory, Livermore, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA

  • Venue:
  • Proceedings of the 38th annual international symposium on Computer architecture
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Local thermal hot-spots in microprocessors lead to worst-case provisioning of global cooling resources, especially in large-scale systems where cooling power can be 50~100% of IT power. Further, the efficiency of cooling solutions degrade non-linearly with supply temperature. Recent advances in active cooling techniques have shown on-chip thermoelectric coolers (TECs) to be very efficient at selectively eliminating small hot-spots. Applying current to a superlattice TEC-film that is deposited between silicon and the heat spreader results in a Peltier effect, which spreads the heat and lowers the temperature of the hot-spot significantly and improves chip reliability. In this paper, we propose that hot-spot mitigation using thermoelectric coolers can be used as a power management mechanism to allow global coolers to be provisioned for a better worst case temperature leading to substantial savings in cooling power. In order to quantify the potential power savings from using TECs in data center servers, we present a detailed power model that integrates on-chip dynamic and leakage power sour-ces, heat diffusion through the entire chip, TEC and global cooler efficiencies, and all their mutual interactions. Our multi-scale analysis shows that, for a typical data center, TECs allow global coolers to operate at higher temperatures without degrading chip lifetime, and thus save ~27% cooling power on average while providing the same processor reliability as a data center running at 288K.