A statistical property of multiagent learning based on Markov decision process

Authors:
K. Iwata;K. Ikeda;H. Sakai
Affiliations:
Fac. of Inf. Sci., Hiroshima City Univ., Japan;-;-
Venue:
IEEE Transactions on Neural Networks
Year:
2006

Citing 0
Cited 4

Information Geometry and Information Theory in Machine Learning

Neural Information Processing
An Information-Theoretic Class of Stochastic Decision Processes

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
An information-spectrum approach to analysis of return maximization in reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.