Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Optimum checkpoints with age dependent failures
Acta Informatica
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Byzantine generals in action: implementing fail-stop processors
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Recovering from processor failures is an important problem in the design and development of reliable systems. We present a concurrent rollback algorithm in extended hypercube networks to recover from crash failures which involves small message and time complexities. The network of an extended hypercube is a hierarchical, low diameter, recursive structure. By appending only O(1) additional information to each message, we use less than O(Nlog N) message exchanges and O(log/sup 2/ N) time elapsed for recovery work where N is the number of processors of the extended hypercube network. The algorithms can be used to recover from the failure of an arbitrary number of processors.