The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Authors:
Kazunori Iwata;Kazushi Ikeda;Hideaki Sakai
Affiliations:
Faculty of Information Sciences, Hiroshima City University, 3-4-1 Ozuka-Higashi, Asa-Minami-Ku, Hiroshima 731-3194, Japan;Department of Systems Science, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan;Department of Systems Science, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
Venue:
Neural Networks
Year:
2006

Citing 16
Cited 5

Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Coding Theorems of Information Theory

Coding Theorems of Information Theory
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Statistical inference under multiterminal rate restrictions: a differential geometric approach

IEEE Transactions on Information Theory
The method of types [information theory]

IEEE Transactions on Information Theory
A lower bound for discrimination information in terms of variation (Corresp.)

IEEE Transactions on Information Theory
Reliability function of a discrete memoryless channel at rates above capacity (Corresp.)

IEEE Transactions on Information Theory
The error exponent for the noiseless encoding of finite ergodic Markov sources

IEEE Transactions on Information Theory
Universal coding with minimum probability of codeword length overflow

IEEE Transactions on Information Theory
Variable-to-fixed length codes provide better large deviations performance than fixed-to-variable length codes

IEEE Transactions on Information Theory
A new criterion using information gain for action selection strategy in reinforcement learning

IEEE Transactions on Neural Networks

Information Geometry and Information Theory in Machine Learning

Neural Information Processing
An Information-Theoretic Class of Stochastic Decision Processes

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
An information-spectrum approach to analysis of return maximization in reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Stochastic processes for return maximization in reinforcement learning

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.