Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Data Diversity: An Approach to Software Fault Tolerance
IEEE Transactions on Computers - Fault-Tolerant Computing
Information survivability control systems
Proceedings of the 21st international conference on Software engineering
Computer Algorithms: C++
Exploiting architectural design knowledge to support self-repairing systems
SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Fault Tolerance: Principles and Practice
Fault Tolerance: Principles and Practice
An infrastructure for multiprocessor run-time adaptation
WOSS '02 Proceedings of the first workshop on Self-healing systems
Enabling automatic adaptation in systems with under-specified elements
WOSS '02 Proceedings of the first workshop on Self-healing systems
Survivable Network System Analysis: A Case Study
IEEE Software
Efficient damage assessment and repair in resilient distributed database systems
Das'01 Proceedings of the fifteenth annual working conference on Database and application security
A Framework for Dynamically Adaptive Applications in a Self-Organized Mobile Network Environment
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Hi-index | 0.00 |
Survivable systems must identify and isolate any damage as quickly as possible to avoid infection epidemic and outbreak in case of a malicious attack. Any delay during the fault detection and isolation process may lead to system unavailability and is unacceptable in mission-critical applications. In this paper a model is presented to perform damage assessment, fault identification and advance warning. The objective is to help confine the damage propagation (direct or transitive), while making the system survive ongoing attacks and performing necessary self-healing. Our major contribution is the study of the patterns of interconnection communications among applications and the use of communication graphs in damage identification and containment.