Automatic Recognition of Intermittent Failures: An Experimental Study of Field Data
IEEE Transactions on Computers
Experimental analysis of computer system dependability
Fault-tolerant computer system design
A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach
IEEE Transactions on Computers
The implementation of a CORBA object group service
Theory and Practice of Object Systems - Special issue high availability in CORBA
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults
IEEE Transactions on Computers
Building a dependable system from a legacy application with CORBA
Journal of Systems Architecture: the EUROMICRO Journal
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
IEEE Transactions on Computers
The Möbius Framework and Its Implementation
IEEE Transactions on Software Engineering
Integrating Reliable Memory in Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Middleware Support for Voting and Data Fusion
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
State Synchronization and Recovery for Strongly Consistent Replicated CORBA Objects
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
NFTAPE: Networked Fault Tolerance and Performance Evaluator
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Implementing a CORBA-Based Architecture for Leveraging the Security Level of Existing Applications
On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
An Interoperable Replication Logic for CORBA Systems
DOA '00 Proceedings of the International Symposium on Distributed Objects and Applications
DOORS: Towards High-Performance Fault Tolerant CORBA
DOA '00 Proceedings of the International Symposium on Distributed Objects and Applications
Discriminating Fault Rate and Persistency to Improve Fault Treatment
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Implementation of Threshold-based Diagnostic Mechanisms for COTS-Based Applications
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
The Lognormal Distribution of Software Failure Rates: Origin and Evidence
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Heartbeat based fault diagnosis for mobile ad-hoc network
ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
The CRUTIAL Architecture for Critical Information Infrastructures
Architecting Dependable Systems V
Architecting and validating dependable systems: experiences and visions
Architecting dependable systems VII
CRUTIAL: the blueprint of a reference critical information infrastructure architecture
CRITIS'06 Proceedings of the First international conference on Critical Information Infrastructures Security
Architecting Dependable Systems III
A Recovery-Oriented Approach for Software Fault Diagnosis in Complex Critical Systems
International Journal of Adaptive, Resilient and Autonomic Systems
Hi-index | 0.00 |
This paper proposes a novel methodology and an architectural framework for handling multiple classes of faults (namely, hardware-induced software errors in the application, process and/or host crashes or hangs, and errors in the persistent system stable storage) in a COTS and Legacy-based application. The basic idea is to use an evidence-accruing fault tolerance manager to choose and carry out one of multiple fault recovery strategies, depending upon the perceived severity of the fault. The methodology and the framework have been applied to a case study system consisting of a Legacy system, which makes use of a COTS DBMS for persistent storage facilities. A thorough performability analysis has also been conducted via combined use of direct measurements and analytical modeling. Experimental results demonstrate that effective fault treatment, consisting of careful diagnosis and damage assessment, plays a key role in leveraging the dependability of COTS and Legacy-based applications.