Horus: a flexible group communication system
Communications of the ACM
Representing agent interaction protocols in UML
First international workshop, AOSE 2000 on Agent-oriented software engineering
Improving fault-tolerance by replicating agents
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
From Active Objects to Autonomous Agents
IEEE Concurrency
Lessons from Designing and Implementing GARF
OBPDC '95 Selected papers from the Workshop, on Object-Based Parallel and Distributed Computation
An Approach for Providing Mobile Agent Fault Tolerance
MA '98 Proceedings of the Second International Workshop on Mobile Agents
Fault-Tolerant Execution of Mobile Agents
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Monitoring teams by overhearing: a multi-agent plan-recognition approach
Journal of Artificial Intelligence Research
Probabilistically survivable MASs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
On fault tolerance in law-governed multi-agent systems
Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
On Fault Tolerance in Law-Governed Multi-agent Systems
Software Engineering for Multi-Agent Systems V
Plan-based replication for fault-tolerant multi-agent systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.01 |
In order to construct and deploy massively multiagent systems, we must address one of the fundamental issues of distributed systems, the possibility of partial failures. In this paper, we discuss the issues and propose an approach for fault-tolerance of massively multiagent systems. The starting idea is the application of replication strategies to agents. As criticality of agents may evolve during the course of computation and problem solving, and as resources are bounded, we need to dynamically and automatically adapt the number of replicas of agents, in order to maximize their reliability and availability. We will describe our approach and related mechanisms for evaluating the criticality of a given agent and how to parameterize it (e.g., number of replicas). We also will report on experiments conducted with our prototype architecture (named DarX).