A genetic algorithm for the generalised assignment problem
Computers and Operations Research
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Temperature-aware microarchitecture: Modeling and implementation
ACM Transactions on Architecture and Code Optimization (TACO)
The Case for Lifetime Reliability-Aware Microprocessors
Proceedings of the 31st annual international symposium on Computer architecture
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Deep Submicron CMOS Integrated Circuit Reliability Simulation with SPICE
ISQED '05 Proceedings of the 6th International Symposium on Quality of Electronic Design
Techniques for Multicore Thermal Management: Classification and New Exploration
Proceedings of the 33rd annual international symposium on Computer Architecture
ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon
IEEE Design & Test
ReCycle:: pipeline adaptation to tolerate process variation
Proceedings of the 34th annual international symposium on Computer architecture
Thermal-aware task scheduling at the system software level
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Self-calibrating Online Wearout Detection
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Facelift: Hiding and slowing down aging in multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Circuit reliability: from physics to architectures
Proceedings of the International Conference on Computer-Aided Design
Cost-effective lifetime and yield optimization for NoC-based MPSoCs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Unified reliability estimation and management of NoC based chip multiprocessors
Microprocessors & Microsystems
Hi-index | 0.00 |
As CMOS feature sizes venture deep into the nanometer regime, wearout mechanisms including negative-bias temperature instability and time-dependent dielectric breakdown can severely reduce processor operating lifetimes and performance. This paper presents an introspective reliability management system, Maestro, to tackle reliability challenges in future chip multiprocessors (CMPs) head-on. Unlike traditional approaches, Maestro relies on low-level sensors to monitor the CMP as it ages (introspection). Leveraging this real-time assessment of CMP health, runtime heuristics identify wearout-centric job assignments (management). By exploiting the complementary effects of the natural heterogeneity (due to process variation and wearout) that exists in CMPs and the diversity found in system workloads, Maestro composes job schedules that intelligently control the aging process. Monte Carlo experiments show that Maestro significantly enhances lifetime reliability through intelligent wear-leveling, increasing the expected service life of a population of 16-core CMPs by as much as 38% compared to a naive, round-robin scheduler. Furthermore, in the presence of process variation, Maestro's wearout-centric scheduling outperformed both performance counter and temperature sensor based schedulers, achieving an order of magnitude more improvement in lifetime throughput – the amount of useful work done by a system prior to failure.