Dynamic non-Bayesian decision making in multi-agent systems

Authors:
Dov Monderer;Moshe Tennenholtz
Affiliations:
Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology, Haifa 32000, Israel;Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology, Haifa 32000, Israel
Venue:
Annals of Mathematics and Artificial Intelligence
Year:
1999

Citing 9
Cited 0

A theory of the learnable

Communications of the ACM
Modular utility representation for decision-theoretic planning

Proceedings of the first international conference on Artificial intelligence planning systems
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Multi-entity models

Machine intelligence 14
Shortest Paths Without a Map

ICALP '89 Proceedings of the 16th International Colloquium on Automata, Languages and Programming
REASONING ABOUT PREFERENCE MODELS

REASONING ABOUT PREFERENCE MODELS
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Dynamic non-Bayesian decision making

Journal of Artificial Intelligence Research
On the axiomatization of qualitative decision criteria

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a group of several non‐Bayesian agents that can fully coordinate their activities and share their past experience in order to obtain a joint goal in face of uncertainty. The reward obtained by each agent is a function of the environment state but not of the action taken by other agents in the group. The environment state (controlled by Nature) may change arbitrarily, and the reward function is initially unknown. Two basic feedback structures are considered. In one of them – the perfect monitoring case – the agents are able to observe the previous environment state as part of their feedback, while in the other – the imperfect monitoring case – all that is available to the agents are the rewards obtained. Both of these settings refer to partially observable processes, where the current environment state is unknown. Our study refers to the competitive ratio criterion. It is shown that, for the imperfect monitoring case, there exists an efficient stochastic policy that ensures that the competitive ratio is obtained for all agents at almost all stages with an arbitrarily high probability, where efficiency is measured in terms of rate of convergence. It is also shown that if the agents are restricted only to deterministic policies then such a policy does not exist, even in the perfect monitoring case.