Totem: a fault-tolerant multicast group communication system
Communications of the ACM
Horus: a flexible group communication system
Communications of the ACM
IEEE Transactions on Software Engineering
Distributed systems (3rd ed.): concepts and design
Distributed systems (3rd ed.): concepts and design
Mobile Code, Distributed Computing, and Agents
IEEE Intelligent Systems
Exception handling in agent-oriented systems
Advances in exception handling techniques
Providing Reliable Agents for Electronic Commerce
TREC '98 Proceedings of the International IFIP/GI Working Conference on Trends in Distributed Systems for Electronic Commerce
Towards Fault-Tolerant and Secure Agentry
WDAG '97 Proceedings of the 11th International Workshop on Distributed Algorithms
An Approach for Mobile Agent Security and Fault Tolerance using Distributed Transactions
ICPADS '97 Proceedings of the 1997 International Conference on Parallel and Distributed Systems
Fault-Tolerant Execution of Mobile Agents
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Exploiting Non-Determinism for Reliability of Mobile Agent Systems
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Modeling Fault-Tolerant Mobile Agent Execution as a Sequence of Agreement Problems
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
The Performance of Checkpointing and Replication Schemes for Fault Tolerant Mobile Agent Systems
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Dependability of CORBA Systems: Service Characterization by Fault Injection
SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Mobile Agent Fault Tolerance for Information Retrieval Applications: An Exception Handling Approach
ISADS '03 Proceedings of the The Sixth International Symposium on Autonomous Decentralized Systems (ISADS'03)
Protected Resource Access for Mobile Agent-based Distributed Computing
ICPPW '98 Proceedings of the 1998 International Conference on Parallel Processing Workshops
Hi-index | 0.00 |
Large-scale distributed applications such as online information retrieval and collaboration over computational elements demand an approach to self-managed computing systems with a minimum of human interference. However, large scales and full distribution often lead to poor system dependability and security, and increase the difficulty in managing and controlling redundancy for fault tolerance. In particular, fault tolerance schemes for mobile agents to survive agent server crash failures in an autonomie environment are complex since developers normally have no control over remote agent servers. Some solutions inject a replica into stable storage upon its arrival at an agent server. But in the event of an agent server crash the replica is unavailable until the agent server recovers. In this paper we present a failure model and an exception handling framework for mobile agent systems. An exception handling scheme is developed for mobile agents to survive agent server crash failures. A replica mobile agent operates at the agent server visited prior to its master's current location. If a master crashes its replica is available as a replacement. The proposed scheme is examined in comparison with a simple time-out scheme. Experimental evaluation is performed, and performance results show that the scheme leads to some overhead in the round trip time when fault tolerance measures are exercised. However the scheme offers the advantage that fault tolerance is provided during the mobile agent trip, i.e. in the event of an agent server crash all agent servers are not revisited.