A near-optimal poly-time algorithm for learning in a class of stochastic games

Authors:
Ronen I. Brafman;Moshe Tennenholtz
Affiliations:
Dept. of Math and Computer Science, Ben-Gurion University, Beer-Sheva, Israel;Faculty of Industrial Eng. and Management, Technion, Haifa, Israel
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 3
Cited 1

Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Dynamic non-Bayesian decision making

Journal of Artificial Intelligence Research

Fast planning in stochastic games

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much effort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games. This solution can be extended to stochastic games in general.