Thermal monitoring mechanisms for chip multiprocessors

Authors:
Jieyi Long;Seda Ogrenci Memik;Gokhan Memik;Rajarshi Mukherjee
Affiliations:
Northwestern University, Evanston, Illinois;Northwestern University, Evanston, Illinois;Northwestern University, Evanston, Illinois;Synopsys Inc., Mountain View, California
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2008

Citing 21
Cited 8

Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A framework for dynamic energy efficiency and temperature management

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design Challenges of Technology Scaling

IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
Temperature-aware microarchitecture

Proceedings of the 30th annual international symposium on Computer architecture
Compact thermal modeling for temperature-aware design

Proceedings of the 41st annual Design Automation Conference
The Case for Lifetime Reliability-Aware Microprocessors

Proceedings of the 31st annual international symposium on Computer architecture
4T-decay sensors: a new class of small, fast, robust, and low-power, temperature/leakage sensors

Proceedings of the 2004 international symposium on Low power electronics and design
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Performance, Energy, and Thermal Considerations for SMT and CMP Architectures

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Coordinated, distributed, formal energy management of chip multiprocessors

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Analytical Model for Sensor Placement on Microprocessors

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Monitoring Temperature in FPGA based SoCs

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A thermal-driven floorplanning algorithm for 3D ICs

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Systematic temperature sensor allocation and placement for microprocessors

Proceedings of the 43rd annual Design Automation Conference
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Thermal sensor allocation and placement for reconfigurable systems

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Accurate temperature-dependent integrated circuit leakage power estimation is easy

Proceedings of the conference on Design, automation and test in Europe

Spectral techniques for high-resolution thermal characterization with limited sensor data

Proceedings of the 46th Annual Design Automation Conference
Thermal monitoring of real processors: techniques for sensor allocation and full characterization

Proceedings of the 47th Design Automation Conference
Accurate direct and indirect on-chip temperature sensing for efficient dynamic thermal management

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
Power-thermal profiling of software applications

Microelectronics Journal
Full-chip runtime error-tolerant thermal estimation and prediction for practical thermal management

Proceedings of the International Conference on Computer-Aided Design
Recent thermal management techniques for microprocessors

ACM Computing Surveys (CSUR)
EigenMaps: algorithms for optimal thermal maps extraction and sensor placement on multicore processors

Proceedings of the 49th Annual Design Automation Conference
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	0.00

Visualization

Abstract

With large-scale integration and increasing power densities, thermal management has become an important tool to maintain performance and reliability in modern process technologies. In the core of dynamic thermal management schemes lies accurate reading of on-die temperatures. Therefore, careful planning and embedding of thermal monitoring mechanisms into high-performance systems becomes crucial. In this paper, we propose three techniques to create sensor infrastructures for monitoring the maximum temperature on a multicore system. Initially, we extend a nonuniform sensor placement methodology proposed in the literature to handle chip multiprocessors (CMPs) and show its limitations. We then analyze a grid-based approach where the sensors are placed on a static grid covering each core and show that the sensor readings can differ from the actual maximum core temperature by as much as 12.6°C when using 16 sensors per core. Also, as large as 10.6% of the thermal emergencies are not captured using the same number of sensors. Based on this observation, we first develop an interpolation scheme, which estimates the maximum core temperature through interpolation of the readings collected at the static grid points. We show that the interpolation scheme improves the measurement accuracy and emergency coverage compared to grid-based placement when using the same number of sensors. Second, we present a dynamic scheme where only a subset of the sensor readings is collected to predict the maximum temperature of each core. Our results indicate that, we can reduce the number of active sensors by as much as 50%, while maintaining similar measurement accuracy and emergency coverage compared to the case where the entire sensor set on the grid is sampled at all times.