Electromigration reliability enhancement via bus activity distribution
DAC '96 Proceedings of the 33rd annual Design Automation Conference
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Joint local and global hardware adaptations for energy
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Exploiting Microarchitectural Redundancy For Defect Tolerance
ICCD '03 Proceedings of the 21st International Conference on Computer Design
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
The Impact of Technology Scaling on Lifetime Reliability
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective
IBM Journal of Research and Development
Reliability limits for the gate insulator in CMOS technology
IBM Journal of Research and Development
Exploiting Structural Duplication for Lifetime Reliability Enhancement
Proceedings of the 32nd annual international symposium on Computer Architecture
The need for a full-chip and package thermal model for thermally optimized IC designs
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Fault Tolerant Asynchronous Adder through Dynamic Self-reconfiguration
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect lifetime prediction under dynamic stress for reliability-aware design
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Reliability modeling and management in dynamic microprocessor-based systems
Proceedings of the 43rd annual Design Automation Conference
Mercury and freon: temperature emulation and management for server systems
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon
IEEE Design & Test
Limiting the power consumption of main memory
Proceedings of the 34th annual international symposium on Computer architecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Online diagnosis of hard faults in microprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient power modeling and software thermal sensing for runtime temperature monitoring
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Temperature aware task scheduling in MPSoCs
Proceedings of the conference on Design, automation and test in Europe
An analysis of timing violations due to spatially distributed thermal effects in global wires
Proceedings of the 44th annual Design Automation Conference
Interconnect lifetime prediction for reliability-aware systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Temperature-aware MPSoC scheduling for reducing hot spots and gradients
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Credit-based dynamic reliability management using online wearout detection
Proceedings of the 5th conference on Computing frontiers
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Temperature management in multiprocessor SoCs using online learning
Proceedings of the 45th annual Design Automation Conference
Proactive temperature management in MPSoCs
Proceedings of the 13th international symposium on Low power electronics and design
Thermal monitoring mechanisms for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proactive temperature balancing for low cost thermal management in MPSoCs
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Thermal-aware reliability analysis for platform FPGAs
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
A control theory approach for thermal balancing of MPSoC
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Static and dynamic temperature-aware scheduling for multiprocessor SoCs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Facelift: Hiding and slowing down aging in multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Multi-mechanism reliability modeling and management in dynamic systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Twin logic gates: improved logic reliability by redundancy concerning gate oxide breakdown
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Electromigration for microarchitects
ACM Computing Surveys (CSUR)
Improving cache lifetime reliability at ultra-low voltages
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The BubbleWrap many-core: popping cores for sequential acceleration
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Utilizing predictors for efficient thermal management in multiprocessor SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Power-efficient, reliable microprocessor architectures: modeling and design methods
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Interval-based models for run-time DVFS orchestration in superscalar processors
Proceedings of the 7th ACM international conference on Computing frontiers
The impact of liquid cooling on 3D multi-core processors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
AgeSim: a simulation framework for evaluating the lifetime reliability of processor-based SoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Process variation and temperature-aware reliability management
Proceedings of the Conference on Design, Automation and Test in Europe
Optimized self-tuning for circuit aging
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the Conference on Design, Automation and Test in Europe
Cost-effective slack allocation for lifetime improvement in NoC-based MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Lifetime reliability-aware task allocation and scheduling for MPSoC platforms
Proceedings of the Conference on Design, Automation and Test in Europe
A self-adaptive system architecture to address transistor aging
Proceedings of the Conference on Design, Automation and Test in Europe
Masking timing errors on speed-paths in logic circuits
Proceedings of the Conference on Design, Automation and Test in Europe
Dynamic thermal management in 3D multicore architectures
Proceedings of the Conference on Design, Automation and Test in Europe
System-level reliability modeling for MPSoCs
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Erasing Core Boundaries for Robust and Configurable Performance
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 38th annual international symposium on Computer architecture
Dimetrodon: processor-level preventive thermal management via idle cycle injection
Proceedings of the 48th Design Automation Conference
Customer-aware task allocation and scheduling for multi-mode MPSoCs
Proceedings of the 48th Design Automation Conference
Mapping of applications to MPSoCs
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Fuzzy control for enforcing energy efficiency in high-performance 3D systems
Proceedings of the International Conference on Computer-Aided Design
Characterizing the lifetime reliability of manycore processors with core-level redundancy
Proceedings of the International Conference on Computer-Aided Design
Maestro: orchestrating lifetime reliability in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Recent thermal management techniques for microprocessors
ACM Computing Surveys (CSUR)
Reliability-aware platform optimization for 3D chip multi-processors
The Journal of Supercomputing
NBTI mitigation in microprocessor designs
Proceedings of the great lakes symposium on VLSI
Mitigating electromigration of power supply networks using bidirectional current stress
Proceedings of the great lakes symposium on VLSI
Proceedings of the 49th Annual Design Automation Conference
Expediating IP lookups with reduced power via TBM and SST supernode caching
Computer Communications
HANDS: heterogeneous architectures and networks-on-chip design and simulation
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Circuit reliability: from physics to architectures
Proceedings of the International Conference on Computer-Aided Design
Lifetime reliability assessment with aging information from low-level sensors
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Proceedings of the Conference on Design, Automation and Test in Europe
Workload and user experience-aware dynamic reliability management in multicore processors
Proceedings of the 50th Annual Design Automation Conference
The autonomic operating system research project: achievements and future directions
Proceedings of the 50th Annual Design Automation Conference
Reliable on-chip systems in the nano-era: lessons learnt and future trends
Proceedings of the 50th Annual Design Automation Conference
Tracking on-chip age using distributed, embedded sensors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Enhancing NBTI recovery in SRAM arrays through recovery boosting
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A 'cool' way of improving the reliability of HPC machines
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ThermOS: system support for dynamic thermal management of chip multi-processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Cost-effective lifetime and yield optimization for NoC-based MPSoCs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Use it or lose it: wear-out and lifetime in future chip multiprocessors
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A survey of cross-layer power-reliability tradeoffs in multi and many core systems-on-chip
Microprocessors & Microsystems
Design configuration selection for hard-error reliable processors via statistical rules
Microprocessors & Microsystems
Hi-index | 0.00 |
Ensuring long processor lifetimes by limiting failuresdue to wear-out related hard errors is a critical requirementfor all microprocessor manufacturers. We observethat continuous device scaling and increasing temperaturesare making lifetime reliability targets even harder to meet.However, current methodologies for qualifying lifetime reliabilityare overly conservative since they assume worst-caseoperating conditions. This paper makes the case thatthe continued use of such methodologies will significantlyand unnecessarily constrain performance. Instead, lifetimereliability awareness at the microarchitectural design stagecan mitigate this problem, by designing processors that dynamicallyadapt in response to the observed usage to meeta reliability target.We make two specific contributions. First, we describean architecture-level model and its implementation, calledRAMP, that can dynamically track lifetime reliability, respondingto changes in application behavior. RAMP isbased on state-of-the-art device models for different wear-outmechanisms. Second, we propose dynamic reliabilitymanagement (DRM) - a technique where the processorcan respond to changing application behavior to maintainits lifetime reliability target. In contrast to currentworst-case behavior based reliability qualification methodologies,DRM allows processors to be qualified for reliabilityat lower (but more likely) operating points than theworst case. Using RAMP, we show that this can save costand/or improve performance, that dynamic voltage scalingis an effective response technique for DRM, and that dynamicthermal management neither subsumes nor is sub-sumedby DRM.