Maintaining availability in partitioned replicated databases
ACM Transactions on Database Systems (TODS)
Leases: an efficient fault-tolerant mechanism for distributed file cache consistency
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
The weakest failure detector for solving consensus
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Communications of the ACM
Horus: a flexible group communication system
Communications of the ACM
Petal: distributed virtual disks
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Capacity planning with phased workloads
Proceedings of the 1st international workshop on Software and performance
Coyote: a system for constructing fine-grain configurable communication services
ACM Transactions on Computer Systems (TOCS)
Voting with Regenerable Volatile Witnesses
Proceedings of the Seventh International Conference on Data Engineering
Weighted voting for replicated data
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
The Decentralized Non-Blocking Atomic Commitment Protocol
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Peer to Peer: Peering into the Future
Advanced Lectures on Networking, NETWORKING 2002 [This book presents the revised version of seven tutorials given at the NETWORKING 2002 Conference in Pisa, Italy in May 2002]
Peer to peer: peering into the future
Advanced lectures on networking
Walking toward moving goalposts: agile management for evolving systems
HotACI'06 Proceedings of the First international conference on Hot topics in autonomic computing
Hi-index | 0.00 |
Failures of all forms happen: from losing single network packets to site-wide disasters. Since businesses rely heavily on their data, it is imperative that failures require minimal time and effort to repair and that the service interruption during the failure or repair period should be as short as possible. To this end, the ideal system should repair itself, relying on humans only when absolutely necessary in the repair process. This paper describes one component of a self-healing storage system: the component that allows for automatic recovery of access to data when the power comes back on after a large-scale outage. Our failure recovery protocol is part of a suite of modular protocols that make up the Palladio distributed storage system. This protocol guarantees that service will be repaired quickly and automatically when enough failures are repaired.