Computing the fault tolerance of multi-agent deployment

Authors:
Yingqian Zhang;Efrat Manisterski;Sarit Kraus;V. S. Subrahmanian;David Peleg
Affiliations:
Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, 2628 CD Delft, The Netherlands;Department of Computer Science, Bar-Ilan University, Ramat Gan, 52900 Israel;Department of Computer Science, Bar-Ilan University, Ramat Gan, 52900 Israel and Department of Computer Science & UMIACS, University of Maryland, College Park, MD 20742, USA;Department of Computer Science & UMIACS, University of Maryland, College Park, MD 20742, USA;Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel
Venue:
Artificial Intelligence
Year:
2009

Citing 31
Cited 1

Introduction to algorithms

Introduction to algorithms
Understanding fault-tolerant distributed systems

Communications of the ACM
Fault detection in an Ethernet network using anomaly signature matching

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
On the hardness of approximate reasoning

Artificial Intelligence
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
Fundamentals of fault-tolerant distributed computing in asynchronous environments

ACM Computing Surveys (CSUR)
Improving fault-tolerance by replicating agents

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Heterogeneous Agent Systems

Heterogeneous Agent Systems
Survivability: Protecting Your Critical Systems

IEEE Internet Computing
Software-Based Replication for Fault Tolerance

Computer
Coalition Agents Experiment: Multiagent Cooperation in International Coalitions

IEEE Intelligent Systems
Industrial MAS for Planning and Control

Proceedings of the 9th ECCAI-ACAI/EASSS 2001, AEMAS 2001, HoloMAS 2001 on Multi-Agent-Systems and Applications II-Selected Revised Papers
Cloning for Intelligent Adaptive Information Agents

Revised Papers from the Second Australian Workshop on Distributed Artificial Intelligence: Multi-Agent Systems: Methodologies and Applications
Understanding Replication in Databases and Distributed Systems

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Leveraging agent properties to assure survivability of distributed multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Using dynamic proxy agent replicate groups to improve fault-tolerance in multi-agent systems

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
The Adaptive Agent Architecture: Achieving Fault-Tolerance Using Persistent Broker Teams

ICMAS '00 Proceedings of the Fourth International Conference on MultiAgent Systems (ICMAS-2000)
On Splitting and Cloning Agents

On Splitting and Cloning Agents
Sketch-based change detection: methods, evaluation, and applications

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
DARX—A Framework For The Fault-Tolerant Support Of Agent Software

ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
A Framework to Control Emergent Survivability of Multi Agent Systems

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Basic Concepts and Taxonomy of Dependable and Secure Computing

IEEE Transactions on Dependable and Secure Computing
Extending the Limits of DMAS Survivability: The UltraLog Project

IEEE Intelligent Systems
Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform

SELMAS '05 Proceedings of the fourth international workshop on Software engineering for large-scale multi-agent systems
Experience and prospects for various control strategies for self-replicating multi-agent systems

Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems
Applying feedback control in adaptive replication mechanisms in fault tolerant multi-agent organizations

Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
Building reliable systems based on self-organizing multi-agent systems

Proceedings of the 2006 international workshop on Software engineering for large-scale multi-agent systems
A dynamic object allocation and replication algorithm for distributed systems with centralized control

International Journal of Computers and Applications
Probabilistically survivable MASs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Distributed algorithms for dynamic survivability of multiagent systems

CLIMA IV'04 Proceedings of the 4th international conference on Computational Logic in Multi-Agent Systems

Algorithms and mechanisms for procuring services with uncertain durations using redundancy

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

A deployment of a multi-agent system on a network refers to the placement of one or more copies of each agent on network hosts, in such a manner that the memory constraints of each node are satisfied. Finding the deployment that is most likely to tolerate faults (i.e. have at least one copy of each agent functioning and in communication with other agents) is a challenge. In this paper, we address the problem of finding the probability of survival of a deployment (i.e. the probability that a deployment will tolerate faults), under the assumption that node failures are independent. We show that the problem of computing the survival probability of a deployment is at least NP-hard. Moreover, it is hard to approximate. We produce two algorithms to accurately compute the probability of survival of a deployment-these algorithms are expectedly exponential. We also produce five heuristic algorithms to estimate survival probabilities-these algorithms work in acceptable time frames. We report on a detailed set of experiments to determine the conditions under which some of these algorithms perform better than the others.