Automatic control systems (7th ed.)
Automatic control systems (7th ed.)
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
The Case for Lifetime Reliability-Aware Microprocessors
Proceedings of the 31st annual international symposium on Computer architecture
Statistical Timing Analysis Considering Spatial Correlations using a Single Pert-Like Traversal
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Interconnect lifetime prediction under dynamic stress for reliability-aware design
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Reliability modeling and management in dynamic microprocessor-based systems
Proceedings of the 43rd annual Design Automation Conference
Self-calibrating Online Wearout Detection
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A statistical approach for full-chip gate-oxide reliability analysis
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
In-field aging measurement and calibration for power-performance optimization
Proceedings of the 48th Design Automation Conference
PGCapping: exploiting power gating for power capping and core lifetime balancing in CMPs
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Fine-grained hardware/software methodology for process migration in MPSoCs
Proceedings of the International Conference on Computer-Aided Design
Workload and user experience-aware dynamic reliability management in multicore processors
Proceedings of the 50th Annual Design Automation Conference
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Statistical thermal modeling and optimization considering leakage power variations
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Unified reliability estimation and management of NoC based chip multiprocessors
Microprocessors & Microsystems
Hi-index | 0.00 |
In aggressively scaled technologies, reliability concerns such as oxide breakdown have become a key issue. Dynamic reliability management (DRM) has been proposed as a mechanism to dynamically explore the tradeoff between system performance and reliability margin. However, existing DRM methods are hampered by the fact that they do not accurately model spatial and temporal variations in process and temperature parameters which have a strong impact on chip reliability. In addition, they make the simplifying assumption that the future workloads are identical to the currently observed one. This makes them sensitive to sudden workload variations and outliers. In this paper, we present a novel workload-aware dynamic reliability management framework that accounts for local variations in both the process and temperature. The reliability estimation, along with the predicted remaining workload is fed to a dynamic voltage/frequency scaling module to manage the system reliability and optimize processor performance. Using a fast on-line analytical/table-look-up method we demonstrate an average error of 1% with up to 5 orders of magnitude speedup compared to Monte Carlo simulation. Experiments on an Alphalike processor show our DRM framework fully utilizes the available margin and achieves 28.7% performance improvement on average.