The UltraSAN modeling environment
Performance Evaluation - Special issue: performance modeling tools
Software reliability via run-time result-checking
Journal of the ACM (JACM)
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
Effects of Field Service on Software Reliability
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Supporting Multiple Levels of Criticality
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
On the Development of Fault-Tolerant On-Board Control Software and Its Evaluation by Fault Injection
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Structured Handling of Online Interface Upgrades in Integrating Dependable Systems of Systems
FIDJI '01 Revised Papers from the International Workshop on Scientific Engineering for Distributed Java Applications
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers
IEEE Transactions on Dependable and Secure Computing
Dynamic software updates for real-time systems
Proceedings of the 2nd International Workshop on Hot Topics in Software Upgrades
Hi-index | 0.00 |
Message-driven confidence-driven (MDCD) error containment and recovery, a low-cost approach to mitigating the effect of software design faults in distributed embedded systems, is developed for onboard guarded software upgrading for deep-space missions. In this paper, we first describe and verify the MDCD algorithms in which we introduce the notion of "confidence-driven" to complement the "communication-induced" approach employed by a number of existing checkpointing protocols to achieve error containment and recovery efficiency. We then conduct a model-based analysis to show that the algorithms ensure low performance overhead. Finally, we discuss the advantages of the MDCD approach and its potential utility as a general-purpose, low-cost software fault tolerance technique for distributed embedded computing.