Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
KQML as an agent communication language
CIKM '94 Proceedings of the third international conference on Information and knowledge management
Understanding the message logging paradigm for masking process crashes
Understanding the message logging paradigm for masking process crashes
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Hi-index | 0.00 |
This paper proposes a new approach to rollback-recovery, using multi-agent in distributed computing system. Previous rollback-recovery protocols were dependent on inherent communication and operating system, which cause a decline of computing performance in distributed computing system. By using multi-agent, we propose rollback-recovery protocol which works independently on operating system. We define three kinds of agent. One is a recovery agent that performs rollback-recovery protocol after a failure. Other is an information agent that constructs domain knowledge as a rule of fault tolerance and information during failure-free operation. The other is the facilitator agent that controls the efficient communication between agents. Also we propose rollback-recovery protocol using multi-agent and simulate the proposed rollback-recovery protocol using JAVA and agent communication language in CORBA environment.