Dynamic non-Bayesian decision making

Authors:
Dov Monderer;Moshe Tennenholtz
Affiliations:
Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, Israel;Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, Israel
Venue:
Journal of Artificial Intelligence Research
Year:
1997

Citing 11
Cited 5

A theory of the learnable

Communications of the ACM
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Modular utility representation for decision-theoretic planning

Proceedings of the first international conference on Artificial intelligence planning systems
Technical Note: \cal Q-Learning

Machine Learning
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Shortest Paths Without a Map

ICALP '89 Proceedings of the 16th International Colloquium on Automata, Languages and Programming
Gambling in a rigged casino: The adversarial multi-armed bandit problem

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
REASONING ABOUT PREFERENCE MODELS

REASONING ABOUT PREFERENCE MODELS
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Dynamic non-Bayesian decision making in multi-agent systems

Annals of Mathematics and Artificial Intelligence
Learning to play strong poker

Machines that learn to play games
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
A near-optimal poly-time algorithm for learning in a class of stochastic games

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
R-MAX: a general polynomial time algorithm for near-optimal reinforcement learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

The model of a non-Bayesian agent who faces a repeated game with incomplete information against Nature is an appropriate tool for modeling general agent-environment interactions. In such a model the environment state (controlled by Nature) may change arbitrarily, and the feedback/reward function is initially unknown. The agent is not Bayesian, that is he does not form a prior probability neither on the state selection strategy of Nature, nor on his reward function. A policy for the agent is a function which assigns an action to every history of observations and actions. Two basic feedback structures are considered. In one of them - the perfect monitoring case - the agent is able to observe the previous environment state as part of his feedback, while in the other - the imperfect monitoring case - all that is available to the agent is the reward obtained. Both of these settings refer to partially observable processes, where the current environment state is unknown. Our main result refers to the competitive ratio criterion in the perfect monitoring case. We prove the existence of an efficient stochastic policy that ensures that the competitive ratio is obtained at almost all stages with an arbitrarily high probability, where efficiency is measured in terms of rate of convergence. It is further shown that such an optimal policy does not exist in the imperfect monitoring case. Moreover, it is proved that in the perfect monitoring case there does not exist a deterministic policy that satisfies our long run optimality criterion. In addition, we discuss the maxmin criterion and prove that a deterministic efficient optimal strategy does exist in the imperfect monitoring case under this criterion. Finally we show that our approach to long-run optimality can be viewed as qualitative, which distinguishes it from previous work in this area.