Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Representative Traces for Processor Models with Infinite Cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Computer Architecture: A Quantitative Approach
Computer Architecture: A Quantitative Approach
Reducing power density through activity migration
Proceedings of the 2003 international symposium on Low power electronics and design
Logic BIST for Large Industrial Designs: Real Issues and Case Studies
ITC '99 Proceedings of the 1999 IEEE International Test Conference
Exploiting Microarchitectural Redundancy For Defect Tolerance
ICCD '03 Proceedings of the 21st International Conference on Computer Design
The Case for Lifetime Reliability-Aware Microprocessors
Proceedings of the 31st annual international symposium on Computer architecture
Tolerating Hard Faults in Microprocessor Array Structures
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
The Impact of Technology Scaling on Lifetime Reliability
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective
IBM Journal of Research and Development
POWER4 system microarchitecture
IBM Journal of Research and Development
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Temperature-Aware On-Chip Networks
IEEE Micro
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Online diagnosis of hard faults in microprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
A robust protocol for concurrent on-line test (COLT) of NoC-based systems-on-a-chip
Proceedings of the 44th annual Design Automation Conference
Reliable multiprocessor system-on-chip synthesis
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Power and reliability management of SoCs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An industrial perspective of power-aware reliable SoC design
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Credit-based dynamic reliability management using online wearout detection
Proceedings of the 5th conference on Computing frontiers
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Application-specific MPSoC reliability optimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Scheduled voltage scaling for increasing lifetime in the presence of NBTI
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Robust concurrent online testing of network-on-chip-based SoCs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Facelift: Hiding and slowing down aging in multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Architectural core salvaging in a multi-core processor for hard-error tolerance
Proceedings of the 36th annual international symposium on Computer architecture
Online work maximization under a peak temperature constraint
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
mSWAT: low-cost hardware fault detection and diagnosis for multicore systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The BubbleWrap many-core: popping cores for sequential acceleration
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The impact of liquid cooling on 3D multi-core processors
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Necromancer: enhancing system throughput by animating dead cores
Proceedings of the 37th annual international symposium on Computer architecture
Scalable thread scheduling and global power management for heterogeneous many-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
AgeSim: a simulation framework for evaluating the lifetime reliability of processor-based SoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Proactive NBTI mitigation for busy functional units in out-of-order microprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Optimized self-tuning for circuit aging
Proceedings of the Conference on Design, Automation and Test in Europe
Cost-effective slack allocation for lifetime improvement in NoC-based MPSoCs
Proceedings of the Conference on Design, Automation and Test in Europe
Lifetime reliability-aware task allocation and scheduling for MPSoC platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Improving yield and reliability of chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
A case for lifetime-aware task mapping in embedded chip multiprocessors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
System-level reliability modeling for MPSoCs
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Combating Aging with the Colt Duty Cycle Equalizer
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
ROSY: recovering processor and memory systems from hard errors
ACM SIGOPS Operating Systems Review
Reliability, thermal, and power modeling and optimization
Proceedings of the International Conference on Computer-Aided Design
Characterizing the lifetime reliability of manycore processors with core-level redundancy
Proceedings of the International Conference on Computer-Aided Design
Recent thermal management techniques for microprocessors
ACM Computing Surveys (CSUR)
Process variation in near-threshold wide SIMD architectures
Proceedings of the 49th Annual Design Automation Conference
SST + gem5 = a scalable simulation infrastructure for high performance computing
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Lifetime improvement through runtime wear-based task mapping
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A self-tuning design methodology for power-efficient multi-core systems
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special section on adaptive power management for energy and temperature-aware computing systems
Reliability-Aware Proactive Energy Management in Hard Real-Time Systems: A Motivational Case Study
International Journal of Adaptive, Resilient and Autonomic Systems
International Journal of Adaptive, Resilient and Autonomic Systems
Enhancing multicore reliability through wear compensation in online assignment and scheduling
Proceedings of the Conference on Design, Automation and Test in Europe
Deconfigurable microprocessor architectures for silicon debug acceleration
Proceedings of the 40th Annual International Symposium on Computer Architecture
VAWOM: temperature and process variation aware wearout management in 3D multicore architecture
Proceedings of the 50th Annual Design Automation Conference
IVF: characterizing the vulnerability of microprocessor structures to intermittent faults
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Cost-effective lifetime and yield optimization for NoC-based MPSoCs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Workload assignment considering NBTI degradation in multicore systems
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Hi-index | 0.00 |
Increased power densities (and resultant temperatures) and other effects of device scaling are predicted to cause significant lifetime reliability problems in the near future. In this paper, we study two techniques that leverage microarchitectural structural redundancy for lifetime reliability enhancement. First, in structural duplication (SD), redundant microarchitectural structures are added to the processor and designated as spares. Spare structures can be turned on when the original structure fails, increasing the processorýs lifetime. Second, graceful performance degradation (GPD) is a technique which exploits existing microarchitectural redundancy for reliability. Redundant structures that fail are shut down while still maintaining functionality, thereby increasing the processorýs lifetime, but at a lower performance. Our analysis shows that exploiting structural redundancy can provide significant reliability benefits, and we present guidelines for efficient usage of these techniques by identifying situations where each is more beneficial. We show that GPD is the superior technique when only limited performance or cost resources can be sacrificed for reliability. Specifically, on average for our systems and applications,GPD increased processor reliability to 1.42 times the base value for less than a 5% loss in performance. On the other hand, for systems where reliability is more important than performance or cost, SD is more beneficial. SD increases reliability to 3.17 times the base value for 2.25 times the base cost, for our applications. Finally, a combination of the two techniques (SD+GPD) provides the highest reliability benefit.