The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
IEEE Internet Computing
The case for power management in web servers
Power aware computing
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
The Case for Lifetime Reliability-Aware Microprocessors
Proceedings of the 31st annual international symposium on Computer architecture
Exploiting Structural Duplication for Lifetime Reliability Enhancement
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnect lifetime prediction under dynamic stress for reliability-aware design
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Static electromigration analysis for on-chip signal interconnects
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Interconnect lifetime prediction for reliability-aware systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
NBTI resilient circuits using adaptive body biasing
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Credit-based dynamic reliability management using online wearout detection
Proceedings of the 5th conference on Computing frontiers
Performance-aware thermal management via task scheduling
ACM Transactions on Architecture and Code Optimization (TACO)
Maestro: orchestrating lifetime reliability in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Recent thermal management techniques for microprocessors
ACM Computing Surveys (CSUR)
Circuit reliability: from physics to architectures
Proceedings of the International Conference on Computer-Aided Design
Unified reliability estimation and management of NoC based chip multiprocessors
Microprocessors & Microsystems
Hi-index | 0.00 |
Most existing integrated circuit (IC) reliability models assume a uniform, typically worst-case, operating temperature, but temporal and spatial temperature variations affect expected device lifetime. As a result, design decisions and dynamic thermal management (DTM) techniques using worst-case models are pessimistic and result in excessive design margins and unnecessary runtime engagement of cooling mechanisms (and associated performance penalties). By leveraging a reliability model that accounts for temperature gradients (dramatically improving interconnect lifetime prediction accuracy) and modeling expected lifetime as a resource that is consumed over time at a temperature-and voltage-dependent rate, substantial design margin can be reclaimed and runtime penalties avoided while meeting expected lifetime requirements. In this paper, we evaluate the potential benefits and implementations of this technique by tracking the expected lifetime of a system under different workloads while accounting for the impact of dynamic voltage and temperature variations. Simulation results show that our dynamic reliability management (DRM) techniques provide a 40% performance penalty reduction over that incurred by pessimistic DTM in general-purpose computing and a 10% increase in quality of service (QoS) for servers, all while preserving the expected IC lifetime reliability.