Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Understanding the message logging paradigm for masking process crashes
Understanding the message logging paradigm for masking process crashes
A Metaobject Architecture for Fault-Tolerant Distributed Systems: The FRIENDS Approach
IEEE Transactions on Computers
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
Replication and fault-tolerance in the ISIS system
Proceedings of the tenth ACM symposium on Operating systems principles
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
A Case for NOW (Networks of Workstations)
IEEE Micro
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Hi-index | 0.00 |
In the clustered environment we have large number of independent components cooperating or collaborating on a computation. Any of this vast number of components can fail at any time, resulting in erroneous output. There are many techniques have been developed to resilience to these kinds of faults. This paper discuss various techniques of fault-tolerance. Success of cluster computing is based on the low cost, widely used commercial of-the-self(COTS) hardware and software.