IBM's S/390 G5 Microprocessor Design
IEEE Micro
An Experimental Study of Security Vulnerabilities Caused by Errors
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Using Memory Errors to Attack a Virtual Machine
SP '03 Proceedings of the 2003 IEEE Symposium on Security and Privacy
Seven Strategies for Tolerating Highly Defective Fabrication
IEEE Design & Test
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The use of triple-modular redundancy to improve computer reliability
IBM Journal of Research and Development
On-Orbit Flight Results from the Reconfigurable Cibola Flight Experiment Satellite (CFESat)
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Interests and limitations of technology scaling for subthreshold logic
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A resilience roadmap: (invited paper)
Proceedings of the Conference on Design, Automation and Test in Europe
Design techniques for cross-layer resilience
Proceedings of the Conference on Design, Automation and Test in Europe
Cross-layer resilience challenges: metrics and optimization
Proceedings of the Conference on Design, Automation and Test in Europe
Reliability challenges of real-time systems in forthcoming technology nodes
Proceedings of the Conference on Design, Automation and Test in Europe
Accurate and efficient reliability estimation techniques during ADL-driven embedded processor design
Proceedings of the Conference on Design, Automation and Test in Europe
Quantitative evaluation of soft error injection techniques for robust system design
Proceedings of the 50th Annual Design Automation Conference
Unified reliability estimation and management of NoC based chip multiprocessors
Microprocessors & Microsystems
Hi-index | 0.00 |
We are rapidly approaching an inflection point where the conventional target of producing perfect, identical transistors that operate without upset can no longer be maintained while continuing to reduce the energy per operation. With power requirements already limiting chip performance, continuing to demand perfect, upset-free transistors would mean the end of scaling benefits. The big challenges in device variability and reliability are driven by uncommon tails in distributions, infrequent upsets, one-size-fits-all technology requirements, and a lack of information about the context of each operation. Solutions co-designed across traditional layer boundaries in our system stack can change the game, allowing architecture and software (a) to compensate for uncommon variation, environments, and events, (b) to pass down invariants and requirements for the computation, and (c) to monitor the health of collections of devices. Cross-layer codesign provides a path to continue extracting benefits from further scaled technologies despite the fact that they may be less predictable and more variable. While some limited multi-layer mitigation strategies do exist, to move forward redefining traditional layer abstractions and developing a framework that facilitates cross-layer collaboration is necessary.