Horus: a flexible group communication system
Communications of the ACM
Multi-agent dependence by dependence graphs
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Improving fault-tolerance by replicating agents
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
An analysis of agent speech acts as institutional actions
Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
From Active Objects to Autonomous Agents
IEEE Concurrency
An Actor-Based Architecture for Customizing and Controlling Agent Ensembles
IEEE Intelligent Systems
Lessons from Designing and Implementing GARF
OBPDC '95 Selected papers from the Workshop, on Object-Based Parallel and Distributed Computation
An Approach for Providing Mobile Agent Fault Tolerance
MA '98 Proceedings of the Second International Workshop on Mobile Agents
Fault-Tolerant Execution of Mobile Agents
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DARX—A Framework For The Fault-Tolerant Support Of Agent Software
ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Monitoring teams by overhearing: a multi-agent plan-recognition approach
Journal of Artificial Intelligence Research
Probabilistically survivable MASs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Experience and prospects for various control strategies for self-replicating multi-agent systems
Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems
On fault tolerance in law-governed multi-agent systems
Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
On Fault Tolerance in Law-Governed Multi-agent Systems
Software Engineering for Multi-Agent Systems V
A Step Towards Fault Tolerance for Multi-Agent Systems
Languages, Methodologies and Development Tools for Multi-Agent Systems
Specification of an exception handling system for a replicated agent environment
Proceedings of the 4th international workshop on Exception handling
Computing the fault tolerance of multi-agent deployment
Artificial Intelligence
Replication Based on Role Concept for Multi-Agent Systems
ESAW '09 Proceedings of the 10th International Workshop on Engineering Societies in the Agents World X
E4MAS'06 Proceedings of the 3rd international conference on Environments for multi-agent systems III
DARX: a self-healing framework for agents
Proceedings of the 12th Monterey conference on Reliable systems on unreliable networked platforms
Towards reliable multi-agent systems: An adaptive replication mechanism
Multiagent and Grid Systems
Towards a predictive fault tolerance approach in multi-agent systems
KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Software Engineering for Multi-Agent Systems IV
Evolution of extensible java EE-Based agent framework
KES-AMSTA'12 Proceedings of the 6th KES international conference on Agent and Multi-Agent Systems: technologies and applications
Contributions to the emergence and consolidation of Agent-oriented Software Engineering
Journal of Systems and Software
Agent-based approaches to managing fault-tolerant networks of distributed multi-agent systems
Multiagent and Grid Systems - Agent Based Computing: From Model to Implementation
Hi-index | 0.00 |
In order to construct and deploy large-scale multi-agent systems, we must address one of the fundamental issues of distributed systems, the possibility of partial failures. This means that fault-tolerance is an inevitable issue for large-scale multi-agent systems. In this paper, we discuss the issues and propose an approach for fault-tolerance of multi-agent systems. The starting idea is the application of replication strategies to agents, the most critical agents being replicated to prevent failures. As criticality of agents may evolve during the course of computation and problem solving, and as resources are bounded, we need to dynamically and automatically adapt the number of replicas of agents, in order to maximize their reliability and availability. We will describe our approach and related mechanisms for evaluating the criticality of a given agent (based on application-level semantic information, e.g. interdependences, and also system-level statistical information, e.g., communication load) and for deciding what strategy to apply (e.g., active replication, passive) how to parameterize it (e.g., number of replicas). We also will report on experiments conducted with our prototype architecture (named DimaX).