Fault-tolerant computing: theory and techniques; vol. 1
Fault-tolerant computing: theory and techniques; vol. 1
Strongly Code Disjoint Checkers
IEEE Transactions on Computers
The Use of Self Checks and Voting in Software Error Detection: An Empirical Study
IEEE Transactions on Software Engineering
Reliability of majority voting based VLSI fault-tolerant circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low-power design
Fault-tolerant computer system design
Fault-tolerant computer system design
DIVA: a reliable substrate for deep submicron microarchitecture design
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Slipstream processors: improving both performance and fault tolerance
ACM SIGPLAN Notices
Software fault tolerance techniques and implementation
Software fault tolerance techniques and implementation
Managing Problems at High Speed
Computer
Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Word Voter: A New Voter Design for Triple Modular Redundant Systems
VTS '00 Proceedings of the 18th IEEE VLSI Test Symposium
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Design and reliability challenges in nanometer technologies
Proceedings of the 41st annual Design Automation Conference
Defect and Error Tolerance in the Presence of Massive Numbers of Defects
IEEE Design & Test
New High Speed CMOS Self-Checking Voter
IOLTS '04 Proceedings of the International On-Line Testing Symposium, 10th IEEE
Variation-tolerant circuits: circuit solutions and techniques
Proceedings of the 42nd annual Design Automation Conference
NonStop® Advanced Architecture
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Relaxed determinism: making redundant execution on multiprocessors practical
HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
Hi-index | 0.00 |
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of reliability by tolerating hardware faults at all levels of the system. Historically, such hardware fault tolerance has only been achievable using custom-designed hardware and proprietary operating systems. Today, however, technological trends and economic factors are driving a reduction in the amount of custom-designed hardware. We believe that this path should be followed to its ultimate conclusion: a highly-available, fault-tolerant computing system based entirely on commodity hardware and open-source operating systems. Our revolutionary approach utilizes virtualization to efficiently provide redundancy on modern commodity hardware. When combined with existing application-level fault tolerance mechanisms, LSRVM will provide very high levels of reliability at extremely low cost.