Synchronizing clocks in the presence of faults
Journal of the ACM (JACM)
Journal of the ACM (JACM)
The MAFT Architecture for Distributed Fault Tolerance
IEEE Transactions on Computers - Fault-Tolerant Computing
A new fault-tolerant algorithm for clock synchronization
Information and Computation
Fault Injection for Dependability Validation: A Methodology and Some Applications
IEEE Transactions on Software Engineering
Intermittent Fault Diagnosis in Multiprocessor Systems
IEEE Transactions on Computers
Reaching Agreement in the Presence of Faults
Journal of the ACM (JACM)
Fault Tolerance: Principles and Practice
Fault Tolerance: Principles and Practice
Fault Injection Techniques and Tools
Computer
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
IEEE Transactions on Computers
Xception: A Technique for the Experimental Evaluation of Dependability in Modern Computers
IEEE Transactions on Software Engineering
Analytical Modelling and Evaluation of Phased-Mission Systems for Space Applications
HASE '97 Proceedings of the 2nd High-Assurance Systems Engineering Workshop
PVS: Combining Specification, Proof Checking, and Model Checking
CAV '96 Proceedings of the 8th International Conference on Computer Aided Verification
Formal Description and Validation for an Integrity Policy Supporting Multiple Levels of Criticality
DCCA '99 Proceedings of the conference on Dependable Computing for Critical Applications
Modeling the dependability of CAUTRA, a subset of the French air traffic control system
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Discriminating Fault Rate and Persistency to Improve Fault Treatment
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Supporting Multiple Levels of Criticality
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Real-Time Scheduling in a Generic Fault-Tolerant Architecture
RTSS '98 Proceedings of the IEEE Real-Time Systems Symposium
State Restoration in a COTS-Based N-Modular Architecture
ISORC '98 Proceedings of the The 1st IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Integrity management in GUARDS
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults
IEEE Transactions on Computers
Low-Cost Error Containment and Recovery for Onboard Guarded Software Upgrading and Beyond
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Dependability of COTS Microkernel-Based Systems
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Solving the Group Priority Inversion Problem in a Timed Asynchronous System
IEEE Transactions on Computers
A Survey on Safety-Critical Multicast Networking
SAFECOMP '00 Proceedings of the 19th International Conference on Computer Safety, Reliability and Security
Avoiding Priority Inversion on the Processing of Requests by Active Replicated Servers
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
IEEE Transactions on Software Engineering
Software environment for integrating critical real-time control systems
Journal of Systems Architecture: the EUROMICRO Journal
Effective Fault Treatment for Improving the Dependability of COTS and Legacy-Based Applications
IEEE Transactions on Dependable and Secure Computing
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Online Diagnosis and Recovery: On the Choice and Impact of Tuning Parameters
IEEE Transactions on Dependable and Secure Computing
Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers
IEEE Transactions on Dependable and Secure Computing
A survey of linguistic structures for application-level fault tolerance
ACM Computing Surveys (CSUR)
A survey of software development approaches addressing dependability
FIDJI'04 Proceedings of the 4th international conference on Scientific Engineering of Distributed Java Applications
SYNERGISTIC VALIDATION METHODOLOGY FOR KNOWLEDGE-BASED ENGINEERING SYSTEMS
Journal of Integrated Design & Process Science
Hi-index | 0.01 |
The development and validation of fault-tolerant computers for critical real-time applications are currently both costly and time consuming. Often, the underlying technology is out-of-date by the time the computers are ready for deployment. Obsolescence can become a chronic problem when the systems in which they are embedded have lifetimes of several decades. This paper gives an overview of the work carried out in a project that is tackling the issues of cost and rapid obsolescence by defining a generic fault-tolerant computer architecture based essentially on commercial off-the-shelf (COTS) components (both processor hardware boards and real-time operating systems). The architecture uses a limited number of specific, but generic, hardware and software components to implement an architecture that can be configured along three dimensions: redundant channels, redundant lanes, and integrity levels. The two dimensions of physical redundancy allow the definition of a wide variety of instances with different fault tolerance strategies. The integrity level dimension allows application components of different levels of criticality to coexist in the same instance. The paper describes the main concepts of the architecture, the supporting environments for development and validation, and the prototypes currently being implemented.