Adaptive replication of large-scale multi-agent systems – towards a fault-tolerant multi-agent platform

Authors:
Zahia Guessoum;Nora Faci;Jean-Pierre Briot
Affiliations:
LIP6, Université Pierre et Marie Curie (Paris 6), Paris, France;MODECO-CReSTIC – IUT de Reims, Reims, France;LIP6, Université Pierre et Marie Curie (Paris 6), Paris, France
Venue:
Software Engineering for Multi-Agent Systems IV
Year:
2006

Citing 19
Cited 0

The interdisciplinary study of coordination

ACM Computing Surveys (CSUR)
Horus: a flexible group communication system

Communications of the ACM
Using self-diagnosis to adapt organizational structures

Proceedings of the fifth international conference on Autonomous agents
Multi-agent dependence by dependence graphs

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Improving fault-tolerance by replicating agents

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
An analysis of agent speech acts as institutional actions

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 3
From Active Objects to Autonomous Agents

IEEE Concurrency
Lessons from Designing and Implementing GARF

OBPDC '95 Selected papers from the Workshop, on Object-Based Parallel and Distributed Computation
An Approach for Providing Mobile Agent Fault Tolerance

MA '98 Proceedings of the Second International Workshop on Mobile Agents
Fault-Tolerant Execution of Mobile Agents

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Implementation and Performance Evaluation of an Adaptable Failure Detector

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Sentinel Approach to Fault Handling in Multi-Agent Systems

Revised Papers from the Second Australian Workshop on Distributed Artificial Intelligence: Multi-Agent Systems: Methodologies and Applications
Using Domain-Independent Exception Handling Services to Enable Robust Open Multi-Agent Systems: The Case of Agent Death

Autonomous Agents and Multi-Agent Systems
A protocol for multi-agent diagnosis with spatially distributed knowledge

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
DARX—A Framework For The Fault-Tolerant Support Of Agent Software

ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform

SELMAS '05 Proceedings of the fourth international workshop on Software engineering for large-scale multi-agent systems
Monitoring teams by overhearing: a multi-agent plan-recognition approach

Journal of Artificial Intelligence Research
Probabilistically survivable MASs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Dynamic and adaptive replication for large-scale reliable multi-agent systems

Software engineering for large-scale multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to construct and deploy large-scale multi-agent systems, we must address one of the fundamental issues of distributed systems, the possibility of partial failures. This means that fault-tolerance is an inevitable issue for large-scale multi-agent systems. In this paper, we discuss the issues and propose an approach for supporting fault-tolerance of multi-agent systems. The starting idea is the application of replication strategies to agents, the most critical agents being replicated to prevent failures. As criticality of agents may evolve during the course of computation and problem solving, and as resources are bounded, we need to dynamically and automatically adapt the number of replicas of agents, in order to maximize their reliability and availability. We will describe our approach and related mechanisms for evaluating the criticality of a given agent (based on application-level semantic information, e.g. interdependences, and also system-level statistical information, e.g., communication load) and for deciding what strategy to apply (e.g., active or passive replication) and how to parameterize it (e.g., number of replicas). We also will report on experiments conducted with our prototype architecture (named DimaX).