Concurrent rollback for crash recovery in extended hypercube networks

Authors:
Tong-Ying Juang;C. P. Chiu;Kun-Ming Yu
Affiliations:
-;-;-
Venue:
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Year:
1995

Citing 9
Cited 0

Optimistic recovery in distributed systems

ACM Transactions on Computer Systems (TOCS)
Efficient distributed recovery using message logging

Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
Optimum checkpoints with age dependent failures

Acta Informatica
Recovery in distributed systems using optimistic message logging and check-pointing

Journal of Algorithms
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit

IEEE Transactions on Computers - Special issue on fault-tolerant computing
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
Byzantine generals in action: implementing fail-stop processors

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recovering from processor failures is an important problem in the design and development of reliable systems. We present a concurrent rollback algorithm in extended hypercube networks to recover from crash failures which involves small message and time complexities. The network of an extended hypercube is a hierarchical, low diameter, recursive structure. By appending only O(1) additional information to each message, we use less than O(Nlog N) message exchanges and O(log/sup 2/ N) time elapsed for recovery work where N is the number of processors of the extended hypercube network. The algorithms can be used to recover from the failure of an arbitrary number of processors.