The Case for Lifetime Reliability-Aware Microprocessors
Proceedings of the 31st annual international symposium on Computer architecture
Reliability modeling and management in dynamic microprocessor-based systems
Proceedings of the 43rd annual Design Automation Conference
Self-calibrating Online Wearout Detection
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Proceedings of the 8th international conference on Mobile systems, applications, and services
Process variation and temperature-aware reliability management
Proceedings of the Conference on Design, Automation and Test in Europe
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Process Variation and Temperature-Aware Full Chip Oxide Breakdown Reliability Analysis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IEEE Transactions on Parallel and Distributed Systems
Compact degradation sensors for monitoring NBTI and oxide degradation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TempoMP: integrated prediction and management of temperature in heterogeneous MPSoCs
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Reliability is a major concern for nanoscale CMOS circuits. Degradation phenomena such as Electromigration, Negative Bias Temperature Instability, Time Dependent Dielectric Breakdown worsen with transistor scaling. Dynamic Reliability Management (DRM) techniques reduce reliability loss at runtime by constraining operating points, but they face the challenge of reducing user experience degradation while meeting a lifetime target. In this work we propose a sensor based hierarchical controller for multicore processor DRM, exploiting the major gap between the time scales of workload variations and reliability loss. We improve performance and user experience by locally relaxing reliability-induced operating point constraints, while meeting them over the large time windows relevant for reliability. With respect to the state-of-the-art, our solution guarantees timely execution of 100% of latency-critical applications, and have a 4% performance improvement over the whole lifetime.